With the advent of the 5G era, big data analysis and applications have received widespread attention from various industries, and various new applications have also generated huge amounts of unstructured data at all times. How to use these unstructured data in a tool-based, platform-based and large-scale manner has become the focus of corporate attention.
From September 24 to 25, the 2022 first unstructured data summit produced by Zilliz, a vector database company that just raised $60 million, was successfully held. This summit is themed "Matrix Revolution - Vector Connecting the World". Zilliz's core products and R&D team have joined hands with experts from finance, artificial intelligence , the Internet and other fields to share the development and application practice of unstructured data processing technology represented by vector databases.
Zilliz Cloud: A new member of the non-vector database product family
As a pioneer in the field of unstructured data, Zilliz has been exploring how to manage, use, and quickly value unstructured data, thereby empowering enterprises to improve efficiency and gain.
Zilliz founder and CEO Star Lord said that unstructured data already accounts for more than 80% of the total data, and its growth rate will exceed structured data in the next few years. But at the same time, the value of unstructured data is far from fully explored. Zilliz will continue to deepen its capabilities in the field of vector data processing, including data observability, workflow management, data security, data privacy, data application and other aspects. Today, Zilliz has contributed two open source projects, Milvus and Towhee, in the unstructured data processing ecosystem, and continuously provides innovative solutions in the field of unstructured data processing.
Star Lord Zilliz Founder and CEO
Words must be done. Zilliz then took this summit as an opportunity to release a new product - Zilliz Cloud, providing users with a new cloud option.
Talking about the original intention of launching Zilliz Cloud, Zilliz partner and technical director Luan Xiaofan explained: In the field of unstructured data, many tools are scattered, and users can only splice various open source components together to use, the processing process will be more cumbersome, and they will encounter various problems in terms of stability and ease of use.
Zilliz Cloud was born to solve these pain points. It is a managed cloud service built on Milvus, created by the original team of Milvus, a team that knows the best knowledge of unstructured processing and vector retrieval. Zilliz Cloud has the characteristics of high availability, cost optimization, and strong scalability. It can break through the barriers between data and complete management actions such as data conversion, analysis, migration, and visualization. More importantly, through this series of operations, unstructured data will be converted into vector data that can be retrieved, thus providing greater value to the business.
Zilliz Product Family Picture
Faced with the gorgeous appearance of new members, Milvus and Towhee, as family veterans, did not give in too much.
Milvus is an open source distributed vector database that not only integrates the industry's mature vector similarity search technology, but also greatly optimizes the high-performance computing framework on this basis. In the upcoming version of Milvus 2.2, the option of disk indexing (DiskANN) has been added. Compared with the traditional pure memory indexing scheme, DiskANN can use the user's local disk as storage index, sacrificing a small amount of query performance, but it can be exchanged for a significant reduction in cost. Users can use lower-cost SSD-equipped machines for database deployment. At the same time, the new version will also add functions such as batch import of data, RBAC permission control, querying Pagination, current limiting and backpressure.
Towhee is a complement to the coverage capabilities of traditional ETL tools. Compared with traditional ETL, the ETL of unstructured data is larger on the original data layer on the business side, the conversion process is more oriented towards deep semantics, and the process will introduce a large amount of AI capabilities. By using Towhee, any user can build a production-oriented high-performance unstructured data processing pipeline based on Python code in one click.In the future, Towhee will continue to be optimized and upgraded, such as providing a pipeline definition interface similar to Spark and Flink on the existing pipeline definition interface; at the same time, it will integrate a technology ecosystem like NVIDIA more deeply to further improve the production-oriented execution efficiency of the entire pipeline; it will continue to work hard to meet the needs of community users and solve the gap in the Chinese model.
Dreaming non-vector database application scenarios
The continuous growth of unstructured data is driving the continuous development of unstructured data analysis and retrieval technology based on AI.
According to Zilliz partner and product director Guo Rentong, at the application ecology level, unstructured data search has good application prospects in scenarios such as image search, video search, text semantic search, cross-channel search, recommendation/question and answer systems, copyright protection, fraud detection, data plagiarism check, network security, drug discovery, abnormal detection, etc.; at the industry ecology level, the current basic software and tools of the unstructured data ecosystem are far less than that of the structured data ecosystem, and there is a very broad growth space in the future. The application of
technology cannot be separated from the practice of different industries. Based on their respective businesses, many guests explained how to effectively extract semantic information of unstructured data and how to achieve large-scale, high-precision, and high-throughput unstructured data analysis and retrieval through practical problems.
China Telecom wing payment risk control director Tang Minwei shared how wing payment uses Milvus to build a smarter financial risk control system;
Baidu senior R&D engineer Fang Zeyang shared how Milvus semantic index library helps Baidu PaddlePaddleNLP improve the accuracy of semantic retrieval;
Huya senior researcher Li Guanzhao shared how Milvus helps Huya team quickly identify and search for sensitive areas and improve the efficiency of security review of video content;
Momo data platform senior expert Kong Yunlong shared How Milvus helps Momo to identify spam information, identify fake photos, etc.
Financial payment, deep learning, live video broadcast, social... Milvus practices and value in more and more fields, which allows us to perceive the great development space for unstructured data and vector retrieval at a close distance, and is more confident in breaking the data island and realizing high-quality data interconnection.
"Single filaments cannot form threads, and lonely trees cannot form forests". It is of great significance to deeply explore the potential of unstructured data. The first unstructured data summit in 2022 shows us a panoramic picture of the technological progress and practical results of unstructured data processing. Looking to the future, Zilliz also issued an initiative, hoping to use this summit to reach a consensus, share experience, innovate and create together, explore with more developers, ecological partners, and startups, promote the application of vector databases in all walks of life, and jointly build a bright future of unstructured data.