menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

ML News

source image

Arxiv

4h

read

315

img
dot

Image Credit: Arxiv

Self-Guided Process Reward Optimization with Masked Step Advantage for Process Reinforcement Learning

  • Process Reinforcement Learning (PRL) has shown potential in enhancing the reasoning abilities of Large Language Models (LLMs).
  • A novel framework called Self-Guided Process Reward Optimization (SPRO) is proposed for process-aware RL with two key innovations.
  • SPRO outperforms vanilla GRPO with higher training efficiency and test accuracy improvement, without incurring additional computational overhead.
  • Experimental results show SPRO maintains stable policy entropy, reduces response length, and prevents reward hacking, making it suitable for industrial implementation.

Read Full Article

like

18 Likes

source image

Arxiv

4h

read

3

img
dot

Image Credit: Arxiv

How Weight Resampling and Optimizers Shape the Dynamics of Continual Learning and Forgetting in Neural Networks

  • Recent work in continual learning has shown the benefits of weight resampling in the last layer of a neural network, known as 'zapping'.
  • Researchers investigated learning and forgetting patterns within convolutional neural networks during training under challenging scenarios like continual learning and few-shot transfer learning.
  • Experiments demonstrated that models trained with 'zapping' recover faster when transitioning to new domains.
  • The study also highlighted how the choice of optimizer affects the dynamics of learning and forgetting, leading to complex patterns of synergy or interference between tasks during sequential learning.

Read Full Article

like

Like

source image

Arxiv

4h

read

291

img
dot

Image Credit: Arxiv

A Privacy-Preserving Indoor Localization System based on Hierarchical Federated Learning

  • A new Privacy-Preserving Indoor Localization System based on Hierarchical Federated Learning is proposed in response to traditional indoor localization techniques' errors and privacy concerns.
  • The system utilizes Federated Learning (FL) with a Deep Neural Network (DNN) model for dynamic indoor localization, addressing privacy, bandwidth, and server reliability issues.
  • Experimental results show that FL-based approach achieves similar performance to a Centralized Model (CL) while ensuring data privacy, bandwidth efficiency, and server reliability.
  • The research suggests that this FL approach offers a secure and efficient solution for indoor localization, contributing to advancements in privacy-enhanced indoor positioning systems.

Read Full Article

like

17 Likes

source image

Arxiv

4h

read

105

img
dot

Image Credit: Arxiv

GradMetaNet: An Equivariant Architecture for Learning on Gradients

  • Practitioners often treat gradients of neural networks as inputs to task-specific algorithms for optimization, editing, and analysis.
  • A new paper introduces GradMetaNet, an architecture designed specifically for processing gradients by following principles like equivariant design and efficient gradient representation.
  • GradMetaNet is demonstrated to outperform previous approaches in approximating natural gradient-based functions for tasks like learned optimization, INR editing, and loss landscape curvature estimation.
  • The architecture, based on simple equivariant blocks, is proven to be universal and effective on a variety of gradient-based tasks involving MLPs and transformers.

Read Full Article

like

6 Likes

source image

Arxiv

4h

read

295

img
dot

Image Credit: Arxiv

AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training

  • AsyncFlow is an asynchronous streaming RL framework designed for efficient post-training of large language models.
  • It aims to address scalability bottlenecks faced by traditional RL frameworks and challenges in complex dataflows, resource idling, and workload imbalance.
  • AsyncFlow introduces distributed data storage and transfer modules, automated pipeline overlapping, and producer-consumer-based asynchronous workflows for improved computational efficiency.
  • The framework is decoupled from underlying training and inference engines, allowing for modular and customizable user experiences. Extensive experiments have shown a significant throughput improvement compared to existing baselines.

Read Full Article

like

17 Likes

source image

Arxiv

4h

read

271

img
dot

Image Credit: Arxiv

GPT, But Backwards: Exactly Inverting Language Model Outputs

  • A new technique has been developed to reconstruct the exact input that led to a language model's output, aiding in post-incident analysis and fake output detection.
  • The technique, called SODA, is a gradient-based algorithm that outperforms existing methods in recovering shorter out-of-distribution inputs from language models.
  • The experiments conducted on LLMs ranging from 33M to 3B parameters showed that SODA was successful in fully recovering 79.5% of shorter inputs but faced challenges with longer input sequences.
  • The study suggests that standard deployment practices may currently offer sufficient protection against the potential misuse of this reconstructive method.

Read Full Article

like

16 Likes

source image

Arxiv

4h

read

278

img
dot

Image Credit: Arxiv

PERTINENCE: Input-based Opportunistic Neural Network Dynamic Execution

  • Deep neural networks (DNNs) are widely used for modeling complex patterns in various domains but can be resource-intensive.
  • New method PERTINENCE dynamically selects suitable models from a pre-trained set based on input complexity to improve efficiency without compromising accuracy.
  • Its genetic algorithm-based approach balances overall accuracy and computational efficiency by optimizing the selection process.
  • The method showcased promising results on CIFAR-10, CIFAR-100, and TinyImageNet datasets, achieving comparable accuracy with up to 36% fewer operations than existing state-of-the-art models.

Read Full Article

like

16 Likes

source image

Arxiv

4h

read

203

img
dot

Image Credit: Arxiv

Relational Causal Discovery with Latent Confounders

  • Estimating causal effects from real-world relational data can be challenging when the underlying causal model and potential confounders are unknown.
  • A new algorithm called RelFCI has been proposed to address the challenge of learning causal models with latent confounders from relational data.
  • RelFCI builds upon existing causal inference and relational causal discovery algorithms to provide sound and complete causal discovery in relational domains.
  • Experimental results show the effectiveness of RelFCI in identifying the correct causal structure in relational causal models with latent confounders.

Read Full Article

like

12 Likes

source image

Arxiv

4h

read

84

img
dot

Image Credit: Arxiv

Revisiting Learning Rate Control

  • The learning rate is a crucial hyperparameter in deep learning, prompting research in both AutoML and deep learning on how to control it effectively.
  • This paper compares different approaches for learning rate control, including classic optimization and online scheduling based on gradient statistics.
  • Results show that while certain methods perform well on specific deep learning tasks, they lack reliability across different settings, emphasizing the need for improved algorithm selection in learning rate control.
  • There is a growing trend indicating that hyperparameter optimization approaches are less effective as models and tasks become more complex, suggesting the importance of exploring new directions like finetunable methods and meta-learning in AutoML.

Read Full Article

like

5 Likes

source image

Arxiv

4h

read

54

img
dot

Image Credit: Arxiv

Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training

  • Gradient-based optimization in deep learning raises privacy and security concerns due to data poisoning attacks and overfitting risks.
  • Black box optimization methods offer an alternative by treating the model as an opaque function, but face challenges in scalability and computational costs, especially in large language models (LLMs).
  • A new method called BBoxER is introduced for LLM post-training, inducing an information bottleneck via implicit compression of training data.
  • BBoxER provides theoretical bounds on generalization, privacy, data poisoning attacks, and robustness to extraction attacks, demonstrating promising results in experiments with LLMs.

Read Full Article

like

3 Likes

source image

Arxiv

4h

read

156

img
dot

Image Credit: Arxiv

Enhanced Generative Model Evaluation with Clipped Density and Coverage

  • Generative models have difficulties in reliably evaluating sample quality for critical applications due to the concepts of fidelity and coverage.
  • To address this issue, two novel metrics, Clipped Density and Clipped Coverage, have been introduced to prevent out-of-distribution samples from biasing aggregated values.
  • These metrics exhibit linear score degradation as poor samples increase, making them easily interpretable as proportions of good samples.
  • Extensive experiments show that Clipped Density and Clipped Coverage outperform existing methods in terms of evaluating generative models in robustness, sensitivity, and interpretability.

Read Full Article

like

9 Likes

source image

Arxiv

4h

read

88

img
dot

Image Credit: Arxiv

LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs

  • Researchers introduce a CPU-efficient meta-generation framework for fine-tuning Large Language Models (LLMs) called Low-Rank Adapters (LoRAs).
  • This framework aims to make LoRA fine-tuning accessible for users with limited computational resources, such as standard laptop CPUs, by developing a meta-operator that maps input datasets to LoRA weights using pre-trained adapters.
  • The proposed method constructs adapters through lightweight combinations of existing LoRAs directly on CPU, offering an alternative to GPU-based fine-tuning. Although the resulting adapters do not match the performance of GPU-trained ones, they consistently outperform the base Mistral model on downstream tasks.
  • The approach presented by the researchers provides a more practical and achievable solution for LoRA fine-tuning without the need for GPUs, showcasing potential benefits for users with limited computational resources.

Read Full Article

like

5 Likes

source image

Arxiv

4h

read

288

img
dot

Image Credit: Arxiv

TD-MPC-Opt: Distilling Model-Based Multi-Task Reinforcement Learning Agents

  • Researchers have developed a novel method for knowledge transfer in model-based reinforcement learning to address constraints of large world models in resource-limited environments.
  • Their technique efficiently distills a multi-task agent with 317M parameters into a compact model with 1M parameters, leading to improved performance on diverse tasks.
  • The distilled model achieved a state-of-the-art normalized score of 28.45, surpassing the original 1M parameter model score of 18.93, showcasing the effectiveness of the distillation process.
  • The researchers further optimized the distilled model through post-training quantization, reducing its size by approximately 50%, aiming to address practical deployment challenges in multi-task reinforcement learning systems.

Read Full Article

like

17 Likes

source image

Arxiv

4h

read

295

img
dot

Image Credit: Arxiv

MILP-SAT-GNN: Yet Another Neural SAT Solver

  • A new method named MILP-SAT-GNN combines Graph Neural Networks (GNNs) with Mixed Integer Linear Programming (MILP) techniques to solve SAT problems.
  • The method involves mapping k-CNF formulae to MILP problems, encoding them as weighted bipartite graphs, and training a GNN for solving SAT problems.
  • The approach shows stable outputs under clause and variable reordering, but has limitations in distinguishing satisfiable from unsatisfiable instances for foldable formulae.
  • The experimental evaluation demonstrates promising results, indicating the effectiveness of the method despite its simple neural architecture.

Read Full Article

like

17 Likes

source image

Arxiv

4h

read

308

img
dot

Image Credit: Arxiv

mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling

  • mGRADE is a hybrid-memory system that combines a temporal 1D-convolution with learnable spacings and a minimal gated recurrent unit.
  • It aims to address the challenge of modeling short- and long-range dynamics on edge devices with tight memory constraints.
  • mGRADE effectively separates and preserves multi-scale temporal features, outperforming pure convolutional and recurrent models on various tasks.
  • The design of mGRADE allows for efficient memory usage, making it promising for memory-constrained multi-scale temporal processing at the edge.

Read Full Article

like

18 Likes

For uninterrupted reading, download the app