Quantum Bit Think Tank from Aofei Temple
Quantum Bit | Official Account QbitAI
AIGCh (AI generated content), this concept can be said to be very popular recently.
For example, Stable Diffusion. Just say a word to it and you can generate a painting in seconds:
Big chunky Venom (huge and solid venom) .
well-known blogger Dagu Spitzer also used it to "remake" Hollywood International superstar version of "Huaqiang Buy Melons":
and the previous Google companies Imagen, OpenAI's DALL·E series, etc., have also become popular AI content generation tools that are popular among netizens.
Some people even participated in the art competition with paintings generated by Midjourney, crushing humans and winning the first place, angering a number of artists
.
But as the saying goes, "technology that can be used is good technology", netizens have pushed the popularity of such AIGC technologies to a high degree of recognition of its strength.
. The market previously released the project for only one month. The company behind Stable Diffusion was valued at 6.9 billion yuan, which is capital's affirmation of AIGC.
So at this time point, it is time to sort out AIGC from multiple angles such as technological development route and industrial landing direction.
Therefore, after conducting in-depth research, the qubit think tank officially released "AIGC/AI Content Generation Industry Outlook Report" , and the core answers three major questions:
- In terms of technology, what creations can AIGC have completed?
- In terms of value, what else can AIGC do besides directly generating works of art?
- In the future, how will AIGC change the content and related industries?
(see the end of the article to obtain the complete report)
AIGC technology and eight scenario applications
AIGC full name is AI-Generated Content, which refers to a technology that uses existing data to find rules and generate relevant content through appropriate generalization capabilities based on artificial intelligence technologies such as generative adversarial network GAN, large pre-trained models, etc.
Similar concepts include Synthetic media, synthetic media, which mainly refers to text, images, audio, etc. generated based on AI.
Gartner also proposed a similar concept Generative AI, that is, generative AI. Generative AI refers to the technology that generates similar raw data from existing data.
has a narrower scope than AIGC considered by qubit think tanks.
We believe that AIGC generation is currently moving from simple cost reduction and efficiency enhancement (represented by generating finance/sports news) to creating additional value (represented by providing painting and creative materials) , and cross-modal/multimodal content has become a key development node. From the perspective of
technology, we believe that the following scenarios will become the focus of future development: cross-modal generation of text-image-video, 2D to 3D generation, and multimodal understanding and combination generation. From the perspective of
, we believe that in the next three years, the two comprehensive AIGC scenarios, virtual life generation and game AI, will become commercialized and mature.
The green part in the figure below is a segmented track that we believe has the potential for rapid growth within 2-3 years.
Text Generation
is represented by segmentation functions such as structural news writing, content continuation, and poetry creation. Text generation based on NLP technology can be regarded as the earliest developed technology in AIGC, and has also been widely commercially implemented in application scenarios such as news reporting and dialogue robots.
From the existing implementation scenarios, we divide it into application-type text and creative text generation. The progress of the former is significantly better than the latter. In addition, from the perspective of application promotion, auxiliary text creation is the most widely implemented scenario at present. Most of the application-type texts in
are structured writing, with customer service chat Q&A, news writing, etc. as the core scenarios.The main players include Automated Insights (AP Wordsmith) , Narrative Science, textengine.io, AX Semantics, Yseop, Arria, Arria, retresco, Viable, Lanzhou Technology , etc. It is also the key layout areas that comprehensively cover the AIGC field companies such as Xiaobing Company , Tencent , Baidu , etc.
creative text is mainly suitable for subdivided scenes such as plot sequels and marketing texts. It has higher text openness and freedom, requires certain creativity and personalization, and has higher technical requirements for generation capabilities.
Representative domestic and foreign companies include Anyword, Phrasee, Persado, Pencil, Copy.ai, Friday.ai, Retresco, Writesonic, Conversion.ai, Snazzy AI, Rasa.io, LongShot.AI, Caiyun Xiaomeng , etc.
In addition to end-to-end text creation, auxiliary text writing is actually the most widely available and implemented scenario in China. It is basically mainly based on material crawling, such as directed information collection, text material preprocessing, automatic clustering and deduplication, and providing relevant materials according to the needs of creators. The domestic representative products of
include writing cat , Gilso writing robot , Get writing , Get writing , writing fox , Wowo AI artificial intelligence writing .
image generation
image generation traditional idea is to generate adversarial network (GAN) , which consists of two parts: generator and discriminator. The generator will grab data, generate new generated data, and mix it into the original data and send it to the discriminator for distinction.
Although the existing GAN has made corresponding breakthroughs in the neural network architecture, loss function design, model training stability, model crash problems, and improved the specific details, internal logic, generation speed of the final image.
However, in order to be widely used in practical applications, GAN still needs to solve the following problems: training instability, large-scale duplication of generated samples, structure and compression.
In 2022, Diffusion Model (Diffusion Model) has become an important discovery in the field of image generation, and it even has the momentum to surpass GAN. Compared with other image generation models (such as GAN, VAE and stream-based models) , the image generation effect of Diffusion Model has been significantly improved in the context of less data required.
And in 3D content generation, the neural radiation field model NeRF has become a new generation model.
NeRF generates a new perspective image by representing the scene as an implicit neural radiation field during rendering through the neural network to query the scene information at the location. Simply put, NeRF uses deep learning to complete the 3D rendering task in computer graphics.
Based on the review of different technical principles, we divide the technical scenarios in the field of image generation into image attribute editing, image local generation and change, and end-to-end image generation. The
attribute editing section can be intuitively understood as PhotoShop that lowers the threshold through AI. The existing representative companies include Meitu Xiuxiu (Meitu AI Open Platform) , Radius5, Photokithml6, Imglarger, Hotpot, Remove.bg, Skylum (Mask AI) , Photodiva.
image part editing part refers to the part changing the composition of the image part and modifying facial features. Typical representative is InsetGAN selected for CVPR2022, which is launched by Adobe.
image end-to-end generation mainly refers to generating a complete image based on sketches, organically combining multiple images to generate new images, and generating target images based on specified attributes.
This part contains two types of scenes, namely creative image generation and functional image generation. The former is mostly reflected in NPF and other forms, while the latter is mostly based on marketing posters/interfaces, logos, model pictures, and user avatars.
vertically represents the company/products including Deepdream Generator, Rosebud.ai, AI Gahaku, artbreeder, nightcafe, starryai, wombo, deepart, obvious, Aliluban , ZMO.ai, Datagrid, Shiyun Technology , Daozi Intelligent Painting System , etc.
audio generation
This kind of technology can be applied to the content creation of popular songs, music, audio books, and soundtrack creation in the fields of video, games, film and television, greatly reducing the procurement cost of music copyright.
The most optimistic scenario we currently have is the automatic generation of functional music such as real-time soundtracks, voice cloning, and psychological comfort.
TTS (Text-to-speech) is quite mature in the field of AIGC and is widely used in customer service and hardware robots, audio book production, voice broadcasting and other tasks.
The key to current technology lies in how to better express the ups and downs through rich text information (such as deep emotions of text, deep semantic understanding, etc.), and to obtain the overall copying ability based on fewer users' personalized data (such as small sample transfer learning) .
vertically represents the company including reflective audio , iFLYTEK , Spichi (DUI) , Readspeaker , DeepZen and Sonantic.
With the changes in content media, short video content dubbing has become an important scene. Some software can automatically generate commentary and dubbing based on documents, and there are more than 150 AI intelligent dubbing anchors with different dialects and tones online. Representative companies include cut-up , Jiuhad dubbing , plus-sounding , XAudioPro, etc.
In the TTS field, voice cloning deserves special attention. This technology is currently used in virtual singer singing, automatic dubbing, etc. On the basis of voice IP, it is of great significance to the animation, movies, and virtual human industries.
represents the company including Bibei Technology , Modulate, overdub, replika, Replica Studios, Lovo, Voice mod, Resemble Ai, Respeecher, DeepZen, Sonantic, VoiceID, Descriptham6.
…
In addition, due to limited space, more AIGC implementation segmentation scenarios can be obtained at the end of the article to learn more about it.
However, overall, we believe that the implementation and promotion of AIGC applications under different tracks is mainly affected by two aspects, the level of specific technologies and the transformation thresholds that appear in actual applications.
and the following technical elements are worth paying attention to: long text generation , open text generation , NeRF model , Diffusion model , Diffusion model , cross-modal large pre-trained model (supported modal data types, modal alignment architecture design, supported downstream application ), small sample learning and self-supervised algorithm , reinforcement learning and environment learning .In terms of
technology scenarios, we believe that there will be obvious explosions in the short term, including chat text generation , personalized marketing text , emotional and details TTS, patchwork video generation , text-based AI painting , voice replica .
AIGC value and industrial development analysis
In the view of qubit think tank, the value of using AI for content creation mainly comes from five points.
is different from the market view. We believe that the last point is that personalized and real-time interaction with AI systems can best reflect its potential value.
Although AIGC is currently unable to complete accurate and controllable generation, we believe that the future technology and market size limits of this track are limited.
below are five main values, and their importance increases step by step.
lowers the threshold for content creation and increases the UGC user group
AIGC can replace manual sound recording, image rendering and other work, allowing more people to participate in the high-value content creation process. This effect is expected to be very obvious in the field of 2B structured content generation, and 2C services will appear in some scenarios. Cross-modal generation becomes the focus of the future.
improves creative and feedback efficiency, laying the foundation for online real-time interaction
At present, efficiency improvement is mainly reflected in improving the production efficiency of professionals. Users are increasingly demanding for personalized digital content that can interact dynamically. Traditional development methods cannot meet the increasing demand, and the consumption speed is much higher than the production speed. AIGC is needed to fill the gap between supply and demand.
But we believe that what is more important is that AI also improves the speed of content feedback generation, which is of great significance to real-time interactive content. It has the possibility of migrating offline and real-person fast interactions to online. That is, AI can assume the social, creative and collaborative functions of real-persons, and new potential scenarios may appear (such as social and exploration games, etc.) .
At present, it is becoming easier for content consumers to project real emotional needs into the virtual world, and it is expected to generate many in-depth and real-time interactive needs, with a considerable market scale.
is highly creative and open based on massive data, which helps to stimulate creative awareness and improve content production diversity
Compared with human artists, AI can access and borrow more data. After content generation based on propt, the content created by AI will have more secondary creation space and freedom.
For example, the generation algorithm can create more possibilities for content creation based on specific conditions or completely randomly based on shapes, color matching, patterns or structures that do not exist in reality, creating a "surreality" and "future" and promoting artistic innovation.
performs secondary disassembly and combines different modal elements, and changes the content production logic and form
Through voice cloning, arrangement style extraction and other means, AIGC can disassemble different modal information corresponding to the original object, such as the speaker's facial image, voice, speech content, etc. After recombining
, it can complete work that cannot be completed in the past due to conditions. For example, the voice of passers-by + professional broadcast logic, faces that are more in line with specific aesthetics, etc., break the limitations of real person/real scenes in the combination of elements.
is linked to other AI systems or databases, and can achieve highly personalized/high frequency optimization
. After linking with specific database (such as real-time updated customer data, market feedback data, historical statistics under specific topics) (such as personalized recommendation system, etc.) , AIGC can adjust its generated content based on more accurate future predictions/personalized predictions.
For example, adjust content marketing text according to user habits, adjust content generation according to channel style, optimize content generation based on historical data, etc. In terms of
industrial chain analysis, since my country's AIGC industry has not yet developed and formed, we have drawn a distribution map of the industrial chain based on our own understanding.
At present, in the upstream, my country's AIGC industry still has many shortcomings, which are reflected with data labeling as the focus.
We believe that in the future, the acquisition of large companies related to business may become the mainstream phenomenon, or there should be a more obvious trend of large companies expanding their business. However, the motivation for large companies to expand their business is to quickly acquire traffic through new selling points and optimize core businesses, and not pay too much attention to the full exploration of AIGC's business value.
Therefore, before a clear new scenario comes out, we believe that this industry is more likely to be scattered in different content consumption scenarios.
The industry threshold and core competitiveness we analyze:
- Whether it is content or extension, the product needs to return to the integrated solution service capabilities
- evades competition pressure from big manufacturers in the later stage
- in-depth binding relationship with the industry 5 l14
- Build a closed-loop business
Finally, we are the six key conclusions we have obtained based on this research:
Full report can be viewed at the end of the WeChat article to get:
https://mp.weixin.qq.com/s/VQefNw_TX48mjfiR927NkQ
— End —
Quantum bit QbitAI · Toutiao account signing
Follow us and learn about cutting-edge technology dynamics
as soon as possible