Machine Learning (ML) Latest News and Trending articles from all top sources only on Techminis

A naukri.com initiative

New

Home

ML News

Arxiv

315

Image Credit: Arxiv

Self-Guided Process Reward Optimization with Masked Step Advantage for Process Reinforcement Learning

Process Reinforcement Learning (PRL) has shown potential in enhancing the reasoning abilities of Large Language Models (LLMs).
A novel framework called Self-Guided Process Reward Optimization (SPRO) is proposed for process-aware RL with two key innovations.
SPRO outperforms vanilla GRPO with higher training efficiency and test accuracy improvement, without incurring additional computational overhead.
Experimental results show SPRO maintains stable policy entropy, reduces response length, and prevents reward hacking, making it suitable for industrial implementation.

Read Full Article

18 Likes

Arxiv

Image Credit: Arxiv

How Weight Resampling and Optimizers Shape the Dynamics of Continual Learning and Forgetting in Neural Networks

Recent work in continual learning has shown the benefits of weight resampling in the last layer of a neural network, known as 'zapping'.
Researchers investigated learning and forgetting patterns within convolutional neural networks during training under challenging scenarios like continual learning and few-shot transfer learning.
Experiments demonstrated that models trained with 'zapping' recover faster when transitioning to new domains.
The study also highlighted how the choice of optimizer affects the dynamics of learning and forgetting, leading to complex patterns of synergy or interference between tasks during sequential learning.

Read Full Article

Arxiv

291

Image Credit: Arxiv

A Privacy-Preserving Indoor Localization System based on Hierarchical Federated Learning

A new Privacy-Preserving Indoor Localization System based on Hierarchical Federated Learning is proposed in response to traditional indoor localization techniques' errors and privacy concerns.
The system utilizes Federated Learning (FL) with a Deep Neural Network (DNN) model for dynamic indoor localization, addressing privacy, bandwidth, and server reliability issues.
Experimental results show that FL-based approach achieves similar performance to a Centralized Model (CL) while ensuring data privacy, bandwidth efficiency, and server reliability.
The research suggests that this FL approach offers a secure and efficient solution for indoor localization, contributing to advancements in privacy-enhanced indoor positioning systems.

Read Full Article

17 Likes

Arxiv

105

Image Credit: Arxiv

GradMetaNet: An Equivariant Architecture for Learning on Gradients

Practitioners often treat gradients of neural networks as inputs to task-specific algorithms for optimization, editing, and analysis.
A new paper introduces GradMetaNet, an architecture designed specifically for processing gradients by following principles like equivariant design and efficient gradient representation.
GradMetaNet is demonstrated to outperform previous approaches in approximating natural gradient-based functions for tasks like learned optimization, INR editing, and loss landscape curvature estimation.
The architecture, based on simple equivariant blocks, is proven to be universal and effective on a variety of gradient-based tasks involving MLPs and transformers.

Read Full Article

6 Likes

Discover more

Arxiv

295

Image Credit: Arxiv

AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training

AsyncFlow is an asynchronous streaming RL framework designed for efficient post-training of large language models.
It aims to address scalability bottlenecks faced by traditional RL frameworks and challenges in complex dataflows, resource idling, and workload imbalance.
AsyncFlow introduces distributed data storage and transfer modules, automated pipeline overlapping, and producer-consumer-based asynchronous workflows for improved computational efficiency.
The framework is decoupled from underlying training and inference engines, allowing for modular and customizable user experiences. Extensive experiments have shown a significant throughput improvement compared to existing baselines.

Read Full Article

17 Likes

Arxiv

271

Image Credit: Arxiv

GPT, But Backwards: Exactly Inverting Language Model Outputs

A new technique has been developed to reconstruct the exact input that led to a language model's output, aiding in post-incident analysis and fake output detection.
The technique, called SODA, is a gradient-based algorithm that outperforms existing methods in recovering shorter out-of-distribution inputs from language models.
The experiments conducted on LLMs ranging from 33M to 3B parameters showed that SODA was successful in fully recovering 79.5% of shorter inputs but faced challenges with longer input sequences.
The study suggests that standard deployment practices may currently offer sufficient protection against the potential misuse of this reconstructive method.

Read Full Article

16 Likes

Arxiv

278

Image Credit: Arxiv

PERTINENCE: Input-based Opportunistic Neural Network Dynamic Execution

Deep neural networks (DNNs) are widely used for modeling complex patterns in various domains but can be resource-intensive.
New method PERTINENCE dynamically selects suitable models from a pre-trained set based on input complexity to improve efficiency without compromising accuracy.
Its genetic algorithm-based approach balances overall accuracy and computational efficiency by optimizing the selection process.
The method showcased promising results on CIFAR-10, CIFAR-100, and TinyImageNet datasets, achieving comparable accuracy with up to 36% fewer operations than existing state-of-the-art models.

Read Full Article

16 Likes

Arxiv

203

Image Credit: Arxiv

Relational Causal Discovery with Latent Confounders

Estimating causal effects from real-world relational data can be challenging when the underlying causal model and potential confounders are unknown.
A new algorithm called RelFCI has been proposed to address the challenge of learning causal models with latent confounders from relational data.
RelFCI builds upon existing causal inference and relational causal discovery algorithms to provide sound and complete causal discovery in relational domains.
Experimental results show the effectiveness of RelFCI in identifying the correct causal structure in relational causal models with latent confounders.

Read Full Article

12 Likes

Arxiv

Image Credit: Arxiv

Revisiting Learning Rate Control

The learning rate is a crucial hyperparameter in deep learning, prompting research in both AutoML and deep learning on how to control it effectively.
This paper compares different approaches for learning rate control, including classic optimization and online scheduling based on gradient statistics.
Results show that while certain methods perform well on specific deep learning tasks, they lack reliability across different settings, emphasizing the need for improved algorithm selection in learning rate control.
There is a growing trend indicating that hyperparameter optimization approaches are less effective as models and tasks become more complex, suggesting the importance of exploring new directions like finetunable methods and meta-learning in AutoML.

Read Full Article

5 Likes

Arxiv

Image Credit: Arxiv

Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training

Gradient-based optimization in deep learning raises privacy and security concerns due to data poisoning attacks and overfitting risks.
Black box optimization methods offer an alternative by treating the model as an opaque function, but face challenges in scalability and computational costs, especially in large language models (LLMs).
A new method called BBoxER is introduced for LLM post-training, inducing an information bottleneck via implicit compression of training data.
BBoxER provides theoretical bounds on generalization, privacy, data poisoning attacks, and robustness to extraction attacks, demonstrating promising results in experiments with LLMs.

Read Full Article

3 Likes

Arxiv

156

Image Credit: Arxiv

Enhanced Generative Model Evaluation with Clipped Density and Coverage

Generative models have difficulties in reliably evaluating sample quality for critical applications due to the concepts of fidelity and coverage.
To address this issue, two novel metrics, Clipped Density and Clipped Coverage, have been introduced to prevent out-of-distribution samples from biasing aggregated values.
These metrics exhibit linear score degradation as poor samples increase, making them easily interpretable as proportions of good samples.
Extensive experiments show that Clipped Density and Clipped Coverage outperform existing methods in terms of evaluating generative models in robustness, sensitivity, and interpretability.

Read Full Article

9 Likes

Arxiv

Image Credit: Arxiv

LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs

Researchers introduce a CPU-efficient meta-generation framework for fine-tuning Large Language Models (LLMs) called Low-Rank Adapters (LoRAs).
This framework aims to make LoRA fine-tuning accessible for users with limited computational resources, such as standard laptop CPUs, by developing a meta-operator that maps input datasets to LoRA weights using pre-trained adapters.
The proposed method constructs adapters through lightweight combinations of existing LoRAs directly on CPU, offering an alternative to GPU-based fine-tuning. Although the resulting adapters do not match the performance of GPU-trained ones, they consistently outperform the base Mistral model on downstream tasks.
The approach presented by the researchers provides a more practical and achievable solution for LoRA fine-tuning without the need for GPUs, showcasing potential benefits for users with limited computational resources.

Read Full Article

5 Likes

Arxiv

288

Image Credit: Arxiv

TD-MPC-Opt: Distilling Model-Based Multi-Task Reinforcement Learning Agents

Researchers have developed a novel method for knowledge transfer in model-based reinforcement learning to address constraints of large world models in resource-limited environments.
Their technique efficiently distills a multi-task agent with 317M parameters into a compact model with 1M parameters, leading to improved performance on diverse tasks.
The distilled model achieved a state-of-the-art normalized score of 28.45, surpassing the original 1M parameter model score of 18.93, showcasing the effectiveness of the distillation process.
The researchers further optimized the distilled model through post-training quantization, reducing its size by approximately 50%, aiming to address practical deployment challenges in multi-task reinforcement learning systems.

Read Full Article

17 Likes

Arxiv

295

Image Credit: Arxiv

MILP-SAT-GNN: Yet Another Neural SAT Solver

A new method named MILP-SAT-GNN combines Graph Neural Networks (GNNs) with Mixed Integer Linear Programming (MILP) techniques to solve SAT problems.
The method involves mapping k-CNF formulae to MILP problems, encoding them as weighted bipartite graphs, and training a GNN for solving SAT problems.
The approach shows stable outputs under clause and variable reordering, but has limitations in distinguishing satisfiable from unsatisfiable instances for foldable formulae.
The experimental evaluation demonstrates promising results, indicating the effectiveness of the method despite its simple neural architecture.

Read Full Article

17 Likes

Arxiv

308

Image Credit: Arxiv

mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling

mGRADE is a hybrid-memory system that combines a temporal 1D-convolution with learnable spacings and a minimal gated recurrent unit.
It aims to address the challenge of modeling short- and long-range dynamics on edge devices with tight memory constraints.
mGRADE effectively separates and preserves multi-scale temporal features, outperforming pure convolutional and recurrent models on various tasks.
The design of mGRADE allows for efficient memory usage, making it promising for memory-constrained multi-scale temporal processing at the edge.

Read Full Article

18 Likes

For uninterrupted reading, download the app