menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Data Science News

>

Google Dee...
source image

Analyticsindiamag

2M

read

427

img
dot

Image Credit: Analyticsindiamag

Google DeepMind Just Made Small Models Irrelevant with RRTs

  • Google DeepMind, in collaboration with KAIST AI, proposes a method called Relaxed Recursive Transformers (RRTs) that reduce the cost, computing, and resources required for a LLM to function.
  • RRTs allow LLMs to be programmed to behave like small language models yet outperform many of the standard SLMs present today.
  • Layer Tying, an RRT technique, allows input to pass through a small number of layers recursively, cutting down memory requirements and significantly reducing computational resources.
  • RRTs introduce low-ranking adaptation that adjusts the shared weights with a slight amount of variation which guarantees distinct behaviour in processing the input.
  • Recursive RRT models provide substantial accuracy improvements and performance parity with full-size models trained on 3 trillion tokens.
  • This method introduces LoRA or low-ranking adaptation. Low-rank matrices are set up leading to substantial energy savings by increasing inference throughput.
  • Compared to other models, the RRT uptrained on 60 billion tokens achieved performance parity with full-size models trained on 3 trillion tokens.
  • RRTs may contribute to impactful energy savings by making LLMs smarter without adding significantly to their footprint.
  • Quantisation and Layer Skip are other ways explored to scale down LLMs without compromising on performance, but RRTs involve parameter sharing and real-time verification during draft token generation.
  • Further research is needed to determine the uptraining cost associated with scaling to larger models before RRTs are deployed in real-world applications.

Read Full Article

like

25 Likes

For uninterrupted reading, download the app