Generative AI: Revolutionising Creativity and Innovation

Exploring the Transformative Impact of Generative AI on Modern Technology and Art

AI that can produce original content—such as text, photos, videos, audio, or software code—in response to a user’s prompt or request is known as generative AI, or gen AI.

Deep learning models, which are advanced machine learning models that mimic the learning and decision-making processes of the human brain, are the foundation of generative artificial intelligence. These models function by finding and encoding the relationships and patterns in massive amounts of data. They then use that information to comprehend queries or requests made by users in natural language and provide pertinent new content in response.

Although artificial intelligence (AI) has been a popular topic in technology for the past ten years, generative AI—more especially, the introduction of ChatGPT in 2022—has catapulted AI into the global spotlight and spurred an unparalleled wave of AI innovation and use. While generative AI poses very real risks and challenges, it also offers individuals and organisations enormous productivity benefits. Nevertheless, businesses are moving forward, investigating how the technology can enhance their internal workflows and improve their offerings. A third of companies are already routinely utilizing generative AI in at least one business function, per research by the management consulting firm McKinsey.¹ According to industry analyst Gartner, over 80% of businesses will have implemented generative AI software or made use of generative AI application programming interfaces (APIs) by 2026.

How generative AI works

Training

A foundation model, which is a deep learning model that forms the basis of various generative AI applications, is where generative AI starts. Large language models (LLMs) are the most widely used foundation models in use today. They were developed for text generation applications, but there are also multimodal foundation models that can support the creation of multiple types of content, as well as foundation models for the generation of images, videos, sounds, and music.

Practitioners use massive amounts of raw, unstructured, unlabeled data—terabytes of data pulled from the internet or other massive data sources, for example—to train a deep learning algorithm in order to build a foundation model. In order to predict the next element in a sequence—for example, the next word in a sentence, the next element in an image, or the next command in a line of code—the algorithm performs and evaluates millions of “fill in the blank” exercises during training. It then continuously modifies itself to minimize the difference between its predictions and the actual data (or “correct” result).

A neural network of parameters—encoded representations of the entities, patterns, and relationships in the data—is what’s left after this training, and it can produce content on its own in response to commands or inputs.

Thousands of clustered graphics processing units (GPUs) and weeks of processing are needed for this training process, which is compute-intensive, time-consuming, and costly, costing millions of dollars. Developers of generation AI can bypass this step and its associated expenses with open-source foundation model projects like Meta’s Llama-2.

Tuning

A foundation model is, to use a metaphor, a generalist: While it is quite knowledgeable about a wide range of content types, it frequently struggles to produce certain outputs with the necessary fidelity or accuracy. To do that, the model needs to be adjusted for a particular content creation task. Several ways to complete this are

Fine Tuning

In order to fine-tune the model, data specific to the content generation application must be fed into it. This data includes questions or prompts that the application is likely to encounter and the appropriate, formatted answers to those questions. To build a customer service chatbot, for instance, a development team would generate hundreds or thousands of documents with correctly labeled customer service questions and answers, which would then be fed into the model.

Fine-tuning requires a lot of work. Developers frequently contract out the work to organizations with sizable workforces for data labeling.

Reinforcement learning with human feedback (RLHF)

When using RLHF, human users provide assessments of generated content, which the model uses to be updated for improved relevance or accuracy. People “scoring” various outputs in response to the same prompt is a common practice in RLHF. However, it can be as easy as having people correct a chatbot’s or virtual assistant’s output by typing or speaking back to it.

Generation, evaluation, more tuning

The results of their generative AI apps are continuously evaluated by developers and users, who can even fine-tune the model once a week to improve accuracy or relevance. (By comparison, the foundation model is updated much less often—possibly once a year or every eighteen months.)

Retrieval augmented generation (RAG) is an additional method for enhancing the performance of a Gen AI application. RAG provides a framework to augment and improve the parameters or representations in the base model by using pertinent sources other than the training set. A generative AI app can always have access to the most recent data thanks to RAG. Additionally, users can view the additional sources through RAG in a transparent manner, meaning that the information in the original foundation model is not.

What generative AI can create

Generative models. Transformer-based systems in particular are capable of producing coherent, contextually relevant text for a wide range of purposes, including creative writing, reports, papers, brochures, emails, web site copy, blogs, articles, and instructions. Additionally, they can complete time-consuming or repetitive writing jobs (like creating document summaries or meta descriptions for websites), freeing up writers’ time for more imaginative, valuable work.

Images and video

Realistic images or original artwork can be produced by image generation tools like DALL-E, Mid journey, and Stable Diffusion. These tools can also be used for style transfer, image-to-image translation, and other image editing and enhancement tasks. Next-generation AI video tools can apply special effects to pre-existing footage more quickly and affordably than other approaches, and they can also turn text prompts into animations.

Sound, speech and music

Generative models can synthesis natural-sounding speech and audio content for voice-enabled AI chatbots and digital assistants, audiobook narration and other applications. The same tools can be used to create original music that sounds and has a structure similar to that of well-written pieces.

Software code

Programming language translation, code snippet autocompletion, original code generation, and code functionality summarization are all capabilities of Gen AI.

Design and art

AI models that are capable of generating original art and design can also help with graphic design. Applications include the creation of dynamic environments, avatars or characters, and special effects for use in video games and virtual simulations.

Simulations and synthetic data

Synthetic data and synthetic structures based on real or synthetic data can be trained into generative AI models. In drug discovery, for instance, generative AI is used to create molecular structures with desired properties, which helps with the creation of novel pharmaceutical compounds.

Generative AI: Revolutionising Creativity and Innovation

Exploring the Transformative Impact of Generative AI on Modern Technology and Art