menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

ML News

source image

Arxiv

2d

read

114

img
dot

Image Credit: Arxiv

Function Fitting Based on Kolmogorov-Arnold Theorem and Kernel Functions

  • This paper proposes a unified theoretical framework based on the Kolmogorov-Arnold representation theorem and kernel methods.
  • The framework establishes a kernel-based feature fitting approach that unifies Kolmogorov-Arnold Networks (KANs) and self-attention mechanisms.
  • A low-rank Pseudo-Multi-Head Self-Attention module (Pseudo-MHSA) is introduced, which reduces parameter count by nearly 50% compared to traditional MHSA.
  • Experiments on the CIFAR-10 dataset demonstrate the performance and similarity of the proposed model to the ViT model under the MAE framework.

Read Full Article

like

6 Likes

source image

Arxiv

2d

read

328

img
dot

Image Credit: Arxiv

Prediction of 30-day hospital readmission with clinical notes and EHR information

  • High hospital readmission rates are associated with significant costs and health risks for patients.
  • Predictive models are crucial in supporting clinicians to determine patient hospital readmissions within a short period.
  • Combining clinical notes and electronic health records (EHRs) helps in predicting 30-day hospital readmissions.
  • A graph neural network (GNN) is used to integrate both information sources, achieving an AUROC of 0.72 and a balanced accuracy of 66.7%.

Read Full Article

like

19 Likes

source image

Arxiv

2d

read

55

img
dot

Image Credit: Arxiv

Unsupervised Anomaly Detection in Multivariate Time Series across Heterogeneous Domains

  • The widespread adoption of digital services has increased the need for anomaly detection in IT operations.
  • A unifying framework for benchmarking unsupervised anomaly detection methods is introduced.
  • The problem of shifts in normal behaviors in AIOps scenarios is highlighted.
  • The proposed approach, Domain-Invariant VAE for Anomaly Detection (DIVAD), outperforms existing methods.

Read Full Article

like

3 Likes

source image

Arxiv

2d

read

185

img
dot

Image Credit: Arxiv

TRACE: Intra-visit Clinical Event Nowcasting via Effective Patient Trajectory Encoding

  • Researchers propose a new model called TRACE for intra-visit clinical event nowcasting in electronic health records (EHR).
  • The model effectively encodes patient trajectories and captures temporal dependencies.
  • It outperforms previous methods in laboratory measurement prediction, improving patient care.
  • The code for the model is available on GitHub for further exploration and use.

Read Full Article

like

11 Likes

source image

Arxiv

2d

read

351

img
dot

Image Credit: Arxiv

Beyond Standard MoE: Mixture of Latent Experts for Resource-Efficient Language Models

  • Mixture of Experts (MoE) has emerged as a pivotal architectural paradigm for efficient scaling of Large Language Models (LLMs), operating through selective activation of parameter subsets for each input token.
  • In this paper, the authors introduce Mixture of Latent Experts (MoLE), a novel parameterization methodology that facilitates the mapping of specific experts into a shared latent space.
  • The MoLE architecture significantly reduces parameter count and computational requirements, addressing challenges such as excessive memory utilization and communication overhead during training and inference.
  • Empirical evaluations demonstrate that MoLE achieves performance comparable to standard MoE implementations while substantially reducing resource requirements.

Read Full Article

like

21 Likes

source image

Arxiv

2d

read

355

img
dot

Image Credit: Arxiv

RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations

  • Reinforcement learning (RL) can transform power grid operations by providing adaptive and scalable controllers essential for grid decarbonization.
  • RL2Grid is a benchmark designed in collaboration with power system operators to accelerate progress in grid control and foster RL maturity.
  • RL2Grid standardizes tasks, state and action spaces, and reward structures within a unified interface for systematic evaluation and comparison of RL approaches.
  • The benchmark results highlight the challenges power grids pose for RL methods, emphasizing the need for novel algorithms capable of handling real-world physical systems.

Read Full Article

like

21 Likes

source image

Arxiv

2d

read

367

img
dot

Image Credit: Arxiv

Fast Training of Recurrent Neural Networks with Stationary State Feedbacks

  • Recurrent neural networks (RNNs) have shown strong performance and faster inference compared to Transformers.
  • A novel method is proposed to replace the computationally expensive backpropagation through time (BPTT) algorithm with a fixed gradient feedback mechanism.
  • The method leverages state-space model (SSM) principles to directly propagate gradients from future time steps, reducing training overhead.
  • Experiments on language modeling benchmarks demonstrate competitive perplexity scores while significantly reducing training costs.

Read Full Article

like

22 Likes

source image

Arxiv

2d

read

83

img
dot

Image Credit: Arxiv

How to safely discard features based on aggregate SHAP values

  • A study investigates the practice of discarding unimportant features based on small aggregate SHAP values.
  • The study finds that small aggregate SHAP values do not necessarily imply that the corresponding feature has no effect on the function.
  • To address this issue, the study suggests aggregating SHAP values over the extended support, which is the product of the marginals of the underlying distribution.
  • The study also extends the findings to KernelSHAP, demonstrating that a small aggregate value justifies feature removal, regardless of the accuracy of the KernelSHAP approximation.

Read Full Article

like

4 Likes

source image

Arxiv

2d

read

79

img
dot

Image Credit: Arxiv

Agent-Based Modeling and Deep Neural Networks for Establishing Digital Twins of Secure Facilities under Sensing Restrictions

  • Digital twin technologies help practitioners simulate, monitor, and predict undesirable outcomes in-silico, while avoiding the cost and risks of conducting live simulation exercises.
  • Virtual reality (VR) based digital twin technologies are especially useful when monitoring human Patterns of Life (POL) in secure nuclear facilities, where live simulation exercises are too dangerous and costly to ever perform.
  • The challenge of collecting data in high-security facilities led to the use of an agent-based model driven by human activity patterns to generate synthetic movement trajectories in a digital twin system called MetaPOL.
  • The study evaluates the efficacy of using deep neural networks to predict the simulated trajectories and distinguish NPC (non-player character) movement during normal operations from that during a simulated emergency response scenario.

Read Full Article

like

4 Likes

source image

Arxiv

2d

read

201

img
dot

Image Credit: Arxiv

Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

  • Text-to-SQL is a challenging task involving multiple reasoning-intensive subtasks, and existing approaches often rely on handcrafted reasoning paths.
  • A novel set of partial rewards tailored for the Text-to-SQL task is proposed, which addresses the reward sparsity issue in reinforcement learning (RL).
  • The proposed rewards include schema-linking, AI feedback, n-gram similarity, and syntax check to enhance reasoning capabilities and generalization.
  • RL-only training with the proposed rewards achieves higher accuracy and superior generalization compared to supervised fine-tuning (SFT) approaches.

Read Full Article

like

12 Likes

source image

Arxiv

2d

read

324

img
dot

Image Credit: Arxiv

Graph ODEs and Beyond: A Comprehensive Survey on Integrating Differential Equations with Graph Neural Networks

  • Graph Neural Networks (GNNs) and differential equations (DEs) are two rapidly advancing areas of research that have shown remarkable synergy in recent years.
  • This survey provides a comprehensive overview of the research at the intersection of GNNs and DEs.
  • The survey categorizes existing methods, discusses their underlying principles, and highlights their applications across different domains.
  • Open challenges and future research directions in this interdisciplinary field are also identified.

Read Full Article

like

19 Likes

source image

Arxiv

2d

read

39

img
dot

Image Credit: Arxiv

TRA: Better Length Generalisation with Threshold Relative Attention

  • Transformers struggle with length generalisation, displaying poor performance even on basic tasks.
  • Two key failures of the self-attention mechanism in Transformers are identified: inability to fully remove irrelevant information and unintentional up-weighting of irrelevant information due to learned positional biases.
  • Selective sparsity and contextualised relative distance are proposed as two mitigations to improve the generalisation capabilities of decoder only transformers.
  • Refactoring the attention mechanism with these two mitigations in place can substantially enhance the performance of transformers in handling length generalisation.

Read Full Article

like

2 Likes

source image

Arxiv

2d

read

233

img
dot

Image Credit: Arxiv

A QUBO Framework for Team Formation

  • A QUBO framework for team formation has been introduced.
  • The objective is to find a set of experts that maximizes skill coverage while minimizing costs.
  • Three TeamFormation variants with different cost functions are formulated using quadratic unconstrained binary optimization (QUBO).
  • QUBO-based solutions leveraging graph neural networks enable transfer learning.

Read Full Article

like

14 Likes

source image

Arxiv

2d

read

193

img
dot

Image Credit: Arxiv

UP-ROM : Uncertainty-Aware and Parametrised dynamic Reduced-Order Model, application to unsteady flows

  • Reduced order models (ROMs) are important in fluid mechanics for low-cost predictions in engineering applications.
  • A new nonlinear reduction strategy is presented for transient flows, incorporating parametrization and uncertainty quantification.
  • The strategy uses a variational auto-encoder (VAE) with variational inference for confidence measurement.
  • The incorporation of attention mechanisms enhances generalization across different dynamics, improving model performance.

Read Full Article

like

11 Likes

source image

Arxiv

2d

read

23

img
dot

Image Credit: Arxiv

Two Heads Are Better than One: Model-Weight and Latent-Space Analysis for Federated Learning on Non-iid Data against Poisoning Attacks

  • Federated Learning is vulnerable to model poisoning attacks due to its distributed nature.
  • Existing defenses against model poisoning attacks assume the data at remote clients are under iid, while in practice they are non-iid.
  • GeminiGuard is a novel defense approach that addresses the gap in non-iid scenarios.
  • GeminiGuard incorporates model-weight analysis and latent-space analysis to enhance defense performance.

Read Full Article

like

1 Like

For uninterrupted reading, download the app