menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

ML News

source image

Arxiv

4d

read

12

img
dot

Image Credit: Arxiv

Reinforce LLM Reasoning through Multi-Agent Reflection

  • Leveraging more test-time computation can enhance the reasoning capabilities of large language models (LLMs).
  • The verify-and-improve paradigm allows dynamic solution exploration and feedback incorporation for LLMs.
  • A new reinforcement learning algorithm called DPSDP is introduced to improve LLM performance by training an actor-critic system to refine answers iteratively.
  • Empirical results show that using DPSDP with various base models leads to enhancements on both in- and out-of-distribution benchmarks, demonstrating the benefits of multi-agent collaboration.

Read Full Article

like

Like

source image

Arxiv

4d

read

296

img
dot

Image Credit: Arxiv

Reinforcement Learning Teachers of Test Time Scaling

  • Training reasoning language models with reinforcement learning for one-hot correctness relies on LM's ability to explore and solve tasks.
  • A new framework introduces Reinforcement-Learned Teachers (RLTs) to avoid RL's exploration challenge by focusing on yielding effective downstream distillation.
  • RLTs are prompted with both questions and solutions to problems to provide detailed explanations tailored for students.
  • In practice, 7B RLTs show higher performance on tasks compared to existing distillation pipelines and can be effectively used for out-of-distribution tasks, enhancing efficiency in the RL reasoning framework.

Read Full Article

like

17 Likes

source image

Arxiv

4d

read

174

img
dot

Image Credit: Arxiv

Spatiotemporal deep learning models for detection of rapid intensification in cyclones

  • Cyclone rapid intensification is the rapid increase in cyclone wind intensity, exceeding a threshold of 30 knots, within 24 hours.
  • Deep learning, ensemble learning, and data augmentation frameworks are evaluated to detect cyclone rapid intensification based on wind intensity and spatial coordinates.
  • Conventional data augmentation methods cannot replicate cyclones that undergo rapid intensification, so deep learning models are used to address the class imbalance problem.
  • Results show that data augmentation improves rapid intensification detection, with spatial coordinates playing a critical role in the models, paving the way for synthetic data generation in spatiotemporal data with extreme events.

Read Full Article

like

10 Likes

source image

Arxiv

4d

read

239

img
dot

Image Credit: Arxiv

FUSE: Measure-Theoretic Compact Fuzzy Set Representation for Taxonomy Expansion

  • Taxonomy Expansion can be formulated as a set representation learning task using fuzzy sets.
  • Existing works model sets as vectors or geometric objects, which are not closed under set operations.
  • FUSE (Fuzzy Set Embedding) is a new formulation that approximates set representation as a fuzzy set, preserving information efficiently.
  • Empirical results show FUSE achieves up to 23% improvement in taxonomy expansion compared to existing baselines.

Read Full Article

like

14 Likes

source image

Arxiv

4d

read

142

img
dot

Image Credit: Arxiv

Learning to Hear Broken Motors: Signature-Guided Data Augmentation for Induction-Motor Diagnostics

  • Machine learning algorithms can enhance diagnostic performance of three-phase engines by combining with traditional signature analysis.
  • A novel unsupervised anomaly generation methodology called Signature-Guided Data Augmentation (SGDA) is proposed to synthesize realistic faults in healthy current signals.
  • SGDA leverages Motor Current Signature Analysis and creates diverse anomalies in the frequency domain without the need for complex simulations, improving diagnostic accuracy and reliability.
  • This hybrid approach shows promise in the field of engine diagnostics, providing a robust and efficient solution for industrial applications.

Read Full Article

like

8 Likes

source image

Arxiv

4d

read

129

img
dot

Image Credit: Arxiv

Improved Scaling Laws in Linear Regression via Data Reuse

  • Neural scaling laws indicate that the test error of large language models decreases as model size and data size increase.
  • Data reuse can enhance scaling laws in linear regression by improving test error bounds on models trained using multi-pass stochastic gradient descent.
  • The study shows that with data reuse, multi-pass SGD achieves a better test error compared to one-pass SGD in certain data-constrained scenarios.
  • Numerical simulations validate the theoretical results presented in the research work.

Read Full Article

like

7 Likes

source image

Arxiv

4d

read

121

img
dot

Image Credit: Arxiv

Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood

  • Offline RL struggles with distributional shifts, leading to $Q$-value overestimation for out-of-distribution actions.
  • Existing methods impose constraints which can be too conservative for evaluating out-of-distribution regions, hindering $Q$-function generalization and policy improvement.
  • A novel approach called Smooth Q-function OOD Generalization (SQOG) enhances $Q$-value estimation by smoothing out-of-distribution $Q$-values with neighboring in-sample $Q$-values within Convex Hull and its Neighborhood (CHN).
  • The proposed Smooth Bellman Operator (SBO) theoretically approximates true $Q$-values for both in-sample and out-of-distribution actions within CHN, and the practical SQOG algorithm outperforms existing state-of-the-art methods in performance and computational efficiency on D4RL benchmarks.

Read Full Article

like

7 Likes

source image

Arxiv

4d

read

114

img
dot

Image Credit: Arxiv

Online Learning-guided Learning Rate Adaptation via Gradient Alignment

  • A new framework called GALA (Gradient Alignment-based Learning rate Adaptation) has been proposed for dynamically adjusting the learning rate in large-scale deep learning models.
  • GALA tracks the alignment between consecutive gradients and uses a local curvature estimate to adapt the learning rate effectively.
  • The method formulates the learning rate selection problem as a one-dimensional online learning problem and pairs it with an algorithm like Follow-the-Regularized-Leader.
  • Empirical results show that optimizers like SGD and Adam, combined with GALA, perform well across various initial learning rates without requiring extensive tuning.

Read Full Article

like

6 Likes

source image

Arxiv

4d

read

284

img
dot

Image Credit: Arxiv

Boosting Gradient Leakage Attacks: Data Reconstruction in Realistic FL Settings

  • Federated Learning (FL) allows collaborative model training while protecting privacy by not exposing raw data.
  • Gradient Leakage Attacks (GLAs) exploit gradients shared during training to reconstruct clients' data, raising privacy concerns.
  • Recent empirical evidence shows that data can still be effectively reconstructed in realistic FL settings despite previous beliefs.
  • A novel technique called FedLeak has been developed to address the vulnerabilities, emphasizing the need for stronger defense methods in FL systems.

Read Full Article

like

17 Likes

source image

Arxiv

4d

read

272

img
dot

Image Credit: Arxiv

Learning to Lead: Incentivizing Strategic Agents in the Dark

  • Study focuses on online learning in generalized principal-agent model with strategic agents having private types and rewards.
  • Principal aims to learn optimal coordination mechanism to minimize strategic regret.
  • Developed sample-efficient algorithm using delaying mechanism, reward estimation framework, and LinUCB algorithm.
  • Established near-optimal regret bound for learning principal's optimal policy in the challenging setting.

Read Full Article

like

16 Likes

source image

Arxiv

4d

read

332

img
dot

Image Credit: Arxiv

MOBODY: Model Based Off-Dynamics Offline Reinforcement Learning

  • MOBODY is a Model-Based Off-Dynamics offline RL algorithm designed to address the limitations of existing off-dynamics offline RL methods.
  • It enables exploration of the target domain by generating synthetic transitions through model rollouts for data augmentation during offline policy learning.
  • MOBODY learns target dynamics using representation learning that discovers a shared latent representation of states and transitions across domains.
  • Evaluation on MuJoCo benchmarks shows that MOBODY outperforms state-of-the-art baselines, particularly in challenging scenarios.

Read Full Article

like

20 Likes

source image

Arxiv

4d

read

320

img
dot

Image Credit: Arxiv

How to Provably Improve Return Conditioned Supervised Learning?

  • Return-Conditioned Supervised Learning (RCSL) simplifies policy learning in sequential decision-making problems by framing it as a supervised learning task with state and return inputs, enhancing stability compared to traditional offline RL algorithms.
  • Reinforced RCSL is introduced to address RCSL's limitation of being performance-constrained by the dataset's policy quality, by incorporating in-distribution optimal return-to-go concept to determine the best achievable future return based on the current state.
  • The theoretical analysis shows that Reinforced RCSL consistently outperforms standard RCSL, offering a more effective approach with simplified return augmentation techniques.
  • Empirical results support the superiority of Reinforced RCSL over RCSL, demonstrating improved performance across various benchmarks in modern decision-making tasks.

Read Full Article

like

19 Likes

source image

Arxiv

4d

read

316

img
dot

Image Credit: Arxiv

MAC: An Efficient Gradient Preconditioning using Mean Activation Approximated Curvature

  • Second-order optimization methods like KFAC offer superior convergence by utilizing curvature information of the loss landscape.
  • MAC, a computationally efficient optimization method, is proposed by analyzing the components of the layer-wise Fisher information matrix used in KFAC.
  • MAC is unique for applying the Kronecker factorization to the FIM of attention layers in transformers and integrating attention scores into preconditioning.
  • Extensive evaluations show that MAC outperforms KFAC and other methods in terms of accuracy, training time, and memory usage across various network architectures and datasets.

Read Full Article

like

19 Likes

source image

Arxiv

4d

read

194

img
dot

Image Credit: Arxiv

AsFT: Anchoring Safety During LLM Fine-Tuning Within Narrow Safety Basin

  • Large language models (LLMs) are vulnerable to safety risks during fine-tuning, where small amounts of data can compromise safeguards.
  • Perturbations along the alignment direction in fine-tuning preserve model safety, while perturbations along orthogonal directions can rapidly degrade safety.
  • A methodology called AsFT (Anchoring Safety in Fine-Tuning) is proposed to constrain fine-tuning within a narrow safety basin by suppressing updates in harmful directions.
  • Experiments show that AsFT outperforms Safe LoRA, reducing harmful behavior, improving model performance, and maintaining robustness across various settings.

Read Full Article

like

11 Likes

source image

Arxiv

4d

read

186

img
dot

Image Credit: Arxiv

Thermodynamically Consistent Latent Dynamics Identification for Parametric Systems

  • A new thermodynamics-informed latent space dynamics identification framework, tLaSDI, has been proposed for modeling parametric nonlinear dynamical systems.
  • The framework combines autoencoders for dimensionality reduction with parametric GENERIC formalism-informed neural networks (pGFINNs) to efficiently learn parametric latent dynamics while upholding thermodynamic principles like free energy conservation and entropy generation.
  • A physics-informed active learning strategy is included to improve model performance through adaptive sampling of training data based on a residual-based error indicator, resulting in better outcomes than uniform sampling at the same computational cost.
  • Numerical experiments on different equations demonstrate that the proposed method achieves significant speed-up, reduced relative errors, and lower training and inference costs, while also providing insights into the thermodynamic behavior of the system through learned latent space dynamics.

Read Full Article

like

11 Likes

For uninterrupted reading, download the app