Data management and storage are keys to successful AI
According to market research firm Gartner, by 2026, more generative AI resources will move from the cloud to endpoint and edge devices. The popular AI model RAG (Retrieval Augmented Generation) relies more on retrieving information from large databases, making efficient and reliable edge data storage the first step for enterprises to successfully adopt generative AI.
Create a high-reliability on-premises AI storage database with QNAP NAS
High-performance, large-capacity QNAP NAS is suitable for storing raw data and can serve as storage/backup servers for RAG.
Scenario 1Cross-platform storage of large raw data
QNAP NAS enables seamless access between local and cloud storage, making it ideal for storing raw data, videos, and photos from various platforms.
•Supports S3 Compatible Object Storage, allowing migration of cloud-stored data to NAS
•Supports native Samba, NFS protocols for seamless data access and sharing across Windows, Linux, macOS, and other platforms
•Supports WORM, powerful data search, RAID data protection, and permission control to prevent unauthorized data modification, ensuring data integrity and consistency
Scenario 2Storage/Backup Server for RAG
QNAP NAS provides petabyte-scale storage potential, advanced snapshot and backup technologies, and all-flash arrays, meeting the stringent requirements for frequent data access and processing in RAG.
•The industry's most complete all-flash NAS product lineup provides greater options and flexibility
•Supports 25/100GbE high-speed networks to unleash the full potential of all-flash NAS for fast data retrieval with high IOPS and low latency
•Natively supports iSCSI / Samba protocols to mount NAS storage space for AI computing servers or other storage devices
•Automatically backs up vector data to QNAP NAS S3 Object Storage Space on a regular basis, simplifying data backup and restoration
•Supports container technologies to accelerate the deployment and management of vector databases
Taipei, Taiwan, November 7, 2024 – QNAP® Systems, Inc. today reaffirmed its commitment to product reliability and customer satisfaction by announcing long-term support (LTS) for its QTS 5.2 and QuTS hero h5.2 operating systems. This initiative is part of QNAP's rigorous software product version lifecycle management policy, which is designed to help users effectively manage their IT infrastructure with predictability and ease.
"As technology evolves, so does the need for a stable and reliable IT environment," said Tim Lin, Product Manager of QNAP. "By providing long-term support for QNAP NAS operating systems, we ensure that our users can continue to depend on QNAP products for their critical data storage needs without concern for frequent major upgrades or compatibility issues."
________________________________________
Key highlights of the LTS initiative:
•Extended Support Duration: QNAP is extending the support for QTS 5.2 and QuTS hero h5.2 to August 2029, ensuring critical security patches and updates are available for a longer period.
•Predictable Management: The lifecycle management policy guarantees that updates and support are predictable, enabling IT administrators to plan upgrades and maintenance without unexpected disruptions.
•Enhanced Reliability: With LTS, users gain enhanced system stability and reduced risk of downtime, which is crucial for businesses where data availability is paramount.
This is one of the key innovations driving rapid acceleration in the capabilities of language models recently. Transformers not only improved the prediction accuracy, they are also easier/more efficient than previous models (to train), allowing for larger model sizes. This is what the GPT architecture above is based on.
If you look at GPT architecture, you can see that it is great for generating the next word in the sequence. It fundamentally follows the same logic we discussed in Part 1. Start with a few words and then continue generating one at a time. But, what if you wanted to do translation. What if you had a sentence in german (e.g. “Wo wohnst du?” = “Where do you live?”) and you wanted to translate it to english. How would we train the model to do this?
Well, first thing we would need to do is figure out a way to input german words. Which means we have to expand our embedding to include both german and english. Now, I guess here is a simply way of inputting the information. Why don’t we just concatenate the german sentence at the beginning of whatever so far generated english is and feed it to the context. To make it easier for the model, we can add a separator. This would look something like this at each step:
This will work, but it has room for improvement:
If the context length is fixed, sometimes the original sentence is lost
The model has a lot to learn here. Two languages simultaneously, but also to know that <SEP> is the separator token where it needs to start translating
You are processing the entire german sentence, with different offsets, for each word generation. This means there will be different internal representations of the same thing and the model should be able to work through it all for translation
Transformer was originally created for this task and consists of an “encoder” and a “decoder” — which are basically two separate blocks. One block simply takes the german sentence and gives out an intermediate representation (again, bunch of numbers, basically) — this is called the encoder.
The second block generates words (we’ve seen a lot of this so far). The only difference is that in addition to feeding it the words generated so far we also feed it the encoded german (from the encoder block) sentence. So as it is generating language, it’s context is basically all the words generated so far, plus the german. This block is called the decoder.
Each of these encoders and decoders consist of a few blocks, notably the attention block sandwiched between other layers. Let’s look at the illustration of a transformer from the paper “Attention is all you need” and try to understand it:
The vertical set of blocks on the left is called the “encoder” and the ones to the right is called the “decoder”. Let’s go over and understand anything that we have not already covered before:
Recap on how to read the diagram: Each of the boxes here is a block that takes in some inputs in the form of neurons, and spits out a set of neurons as output that can then either be processed by the next block or interpreted by us. The arrows show where the output of a block is going. As you can see, we will often take the output of one block and feed it in as input into multiple blocks. Let’s go through each thing here:
Feed forward: A feedforward network is one that does not contain cycles. Our original network in section 1 is a feed forward. In-fact, this block uses very much the same structure. It contains two linear layers, each followed by a RELU (see note on RELU in first section) and a dropout layer. Keep in mind that this feedforward neetwork applies to each position independently. What this means is that the information on position 0 has a feedforward network, and on position 1 has one and so on.. But the neurons from position x do not have a linkage to the feedforward network of position y. This is important because if we did not do this, it would allow the network to cheat during training time by looking forward.
Cross-attention: You will notice that the decoder has a multi-head attention with arrows coming from the encoder. What is going on here? Remember the value, key, query in self-attention and multi-head attention? They all came from the same sequence. The query was just from the last word of the sequence in-fact. So what if we kept the query but fetched the value and key from a completely different sequence altogether? That is what is happening here. The value and key come from the output of the encoder. Nothing has changed mathematically except where the inputs for key and value are coming from now.
Nx: The Nx here simply represents that this block is chain-repeated N times. So basically you are stacking the block back-to-back and passing the input from the previous block to the next one. This is a way to make the neural network deeper. Now, looking at the diagram there is room for confusion about how the encoder output is fed to the decoder. Let’s say N=5. Do we feed the output of each encoder layer to the corresponding decoder layer? No. Basically you run the encoder all the way through once and only once. Then you just take that representation and feed the same thing to every one of the 5 decoder layers.
Add & Norm block: This is basically the same as below (guess the authors were just trying to save space)
Everything else has already been discussed. Now you have a complete explanation of the transformer architecture building up from simple sum and product operations and fully self contained! You know what every line, every sum, every box and word means in terms of how to build them from scratch. Theoretically, these notes contain what you need to code up the transformer from scratch. In-fact, if you are interested this repo does that for the GPT architecture above.
API architectural styles determine how applications communicate. The choice of an API architecture can have significant implications on the efficiency, flexibility, and robustness of an application. So it is very important to choose based on your application's requirements, not just what is often used. Let’s examine some prominent styles:
REST A cornerstone in web services, REST leverages HTTP methods for streamlined operations and a consistent interface. Its stateless nature ensures scalability, while URI-based resource identification provides structure. REST's strength lies in its simplicity, enabling scalable and maintainable systems. Learn more about REST here: https://drp.li/what-is-a-rest-api-z7lk…
GraphQL Whilst REST uses multiple endpoints for each resource and necessitates multiple requests to obtain interconnected data; GraphQL uses a single endpoint, allowing users to specify exact data needs, and delivers the requested data in a single query. This approach reduces over-fetching, improving both performance and user experience. Learn more about GraphQL here: https://drp.li/graphql-how-does-it-work-z7l…
SOAP Once dominant, SOAP remains vital in enterprises for its security and transactional robustness. It’s XML-based, versatile across various transport protocols, and includes WS-Security for comprehensive message security. Learn more about SOAP here: https://drp.li/soap-how-does-it-work-z7lk…
gRPC gRPC is efficient in distributed systems, offering bidirectional streaming and multiplexing. Its use of Protocol Buffers ensures efficient serialization and is suitable for a variety of programming languages and use cases across different domains. Learn more about gRPC here: https://drp.li/what-is-grpc-z7lk…
WebSockets For applications demanding real-time communication, WebSockets provide a full-duplex communication channel over a single, long-lived connection. It's popular for applications requiring low latency and continuous data exchange. Learn more about WebSockets here: https://drp.li/webhooks-and-websockets-z7lk…
MQTT MQTT is a lightweight messaging protocol optimized for high-latency or unreliable networks. Its pub/sub model ensures efficient data dissemination among a vast array of devices, making it a go-to choice for IoT applications. Learn more about MQTT here: https://drp.li/automation-with-mqtt-z7lk…
API architectural styles are more than just communication protocols; they are strategic choices that influence the very fabric of application interactions. There is no best architectural style. Each offers unique benefits, shaping the functionality and interaction of applications. It's about making the right choice(s) based on your application's requirements. If you want to learn more about API development, the 2024 State of the API Report has just been released. Check it out for key trends and insights: https://drp.li/state-of-the-api-report-z7dp…
Affiliation at the time of the award: University of Toronto, Toronto, Canada
Prize motivation: “for foundational discoveries and inventions that enable machine learning with artificial neural networks”
When we talk about artificial intelligence, we often mean machine learning using artificial neural networks. This technology was originally inspired by the structure of the brain. In an artificial neural network, the brain’s neurons are represented by nodes that have different values. In 1983–1985, Geoffrey Hinton used tools from statistical physics to create the Boltzmann machine, which can learn to recognise characteristic elements in a set of data. The invention became significant, for example, for classifying and creating images.
Geoffrey Hinton, was a VP and engineering fellow at Google and a pioneer of deep learning who developed some of the most important techniques at the heart of modern AI.
According to the Times, Hinton says he has new fears about the technology he helped usher in and wants to speak openly about them, and that a part of him now regrets his life’s work.
Hinton, who will be speaking live to MIT Technology Review at EmTech Digital on Wednesday in his first post-resignation interview, was a joint recipient with Yann Lecun and Yoshua Bengio of the 2018 Turing Award—computing’s equivalent of the Nobel.
“Geoff’s contributions to AI are tremendous,” says Lecun, who is chief AI scientist at Meta. “He hadn’t told me he was planning to leave Google, but I’m not too surprised.”
The 75-year-old computer scientist has divided his time between the University of Toronto and Google since 2013, when the tech giant acquired Hinton’s AI startup DNNresearch. Hinton’s company was a spinout from his research group, which was doing cutting-edge work with machine learning for image recognition at the time. Google used that technology to boost photo search and more.
Hinton has long called out ethical questions around AI, especially its co-optation for military purposes. He has said that one reason he chose to spend much of his career in Canada is that it is easier to get research funding that does not have ties to the US Department of Defense.
“Geoff has made foundational breakthroughs in AI, and we appreciate his decade of contributions at Google,” says Google chief scientist Jeff Dean. “I’ve deeply enjoyed our many conversations over the years. I’ll miss him, and I wish him well.”
Dean says: “As one of the first companies to publish AI Principles, we remain committed to a responsible approach to AI. We’re continually learning to understand emerging risks while also innovating boldly.”
Hinton is best known for an algorithm called backpropagation, which he first proposed with two colleagues in the 1980s. The technique, which allows artificial neural networks to learn, today underpins nearly all machine-learning models. In a nutshell, backpropagation is a way to adjust the connections between artificial neurons over and over until a neural network produces the desired output.
Hinton believed that backpropagation mimicked how biological brains learn. He has been looking for even better approximations since, but he has never improved on it.
“In my numerous discussions with Geoff, I was always the proponent of backpropagation and he was always looking for another learning procedure, one that he thought would be more biologically plausible and perhaps a better model of how learning works in the brain,” says Lecun.
“Geoff Hinton certainly deserves the greatest credit for many of the ideas that have made current deep learning possible,” says Bengio, who is a professor at the University of Montreal and scientific director of the Montreal Institute for Learning Algorithms. “I assume this also makes him feel a particularly strong sense of responsibility in alerting the public about potential risks of the ensuing advances in AI.”