menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

ML News

source image

Arxiv

2h

read

258

img
dot

Image Credit: Arxiv

Quantile Activation: Correcting a Failure Mode of ML Models

  • An established failure mode for machine learning models occurs when the same features are equally likely to belong to class 0 and class 1.
  • Standard neural network architectures like MLPs or CNNs are not equipped to handle this problem.
  • A simple activation function called quantile activation (QACT) is proposed to address this issue.
  • QACT produces the relative quantile of the sample in its context distribution, improving generalization across distortions compared to conventional classifiers.

Read Full Article

like

15 Likes

source image

Arxiv

2h

read

177

img
dot

Image Credit: Arxiv

ARC: A Generalist Graph Anomaly Detector with In-Context Learning

  • Graph anomaly detection (GAD) is a technique to identify abnormal nodes within a graph.
  • Current GAD methods require dataset-specific training, leading to high costs and limited generalizability.
  • ARC is a generalist GAD approach that can detect anomalies across various graph datasets on-the-fly.
  • ARC uses in-context learning to extract dataset-specific patterns without retraining or fine-tuning.

Read Full Article

like

10 Likes

source image

Arxiv

2h

read

203

img
dot

Image Credit: Arxiv

Hierarchical Classification Auxiliary Network for Time Series Forecasting

  • Deep learning has revolutionized time series forecasting by capturing sequence relationships.
  • However, training with Mean Square Error (MSE) loss often leads to over-smooth predictions.
  • To address this, a novel approach of tokenizing time series values and using cross-entropy loss is proposed.
  • The approach includes a Hierarchical Classification Auxiliary Network (HCAN) to integrate high-entropy features at different hierarchy levels.

Read Full Article

like

12 Likes

source image

Arxiv

2h

read

301

img
dot

Image Credit: Arxiv

Memory-Efficient Gradient Unrolling for Large-Scale Bi-level Optimization

  • Bi-level optimization has become crucial for hierarchical machine learning problems, but traditional gradient-based algorithms are not suitable for large-scale applications.
  • A new approach called FG2U (Forward Gradient Unrolling with Forward Fradient) is introduced, which provides more accurate gradient estimates and supports parallel computing.
  • FG2U can be used in different stages of the training process and is easily implemented in deep learning frameworks.
  • Extensive evaluations demonstrate the superior performance of FG2U in diverse large-scale bi-level optimization tasks.

Read Full Article

like

18 Likes

source image

Arxiv

2h

read

249

img
dot

Image Credit: Arxiv

Applications of Scientific Machine Learning for the Analysis of Functionally Graded Porous Beams

  • This study investigates different Scientific Machine Learning (SciML) approaches for the analysis of functionally graded porous beams.
  • The methods consider the output of a neural network/operator as an approximation to the displacement fields and derive the equations governing beam behavior.
  • The study compares three approaches: (a) Physics-Informed Neural Network (PINN), (b) Deep Energy Method (DEM), and (c) Neural Operator methods.
  • A neural operator has been trained to predict the response of the porous beam with functionally graded material under any porosity distribution pattern and traction condition.

Read Full Article

like

14 Likes

source image

Arxiv

2h

read

22

img
dot

Image Credit: Arxiv

FSDEM: Feature Selection Dynamic Evaluation Metric

  • Expressive evaluation metrics are indispensable for informative experiments in all areas, and while several metrics are established in some areas, in others, such as feature selection, only indirect or otherwise limited evaluation metrics are found.
  • In this paper, the authors propose a novel evaluation metric to address several problems of its predecessors and allow for flexible and reliable evaluation of feature selection algorithms.
  • The proposed metric is a dynamic metric with two properties that can be used to evaluate both the performance and the stability of a feature selection algorithm.
  • Empirical experiments are conducted to illustrate the use of the proposed metric in the successful evaluation of feature selection algorithms, and a comparison and analysis are provided to show the different aspects involved in the evaluation.

Read Full Article

like

1 Like

source image

Arxiv

2h

read

295

img
dot

Image Credit: Arxiv

Flow Matching for Optimal Reaction Coordinates of Biomolecular System

  • Flow matching for reaction coordinates (FMRC) is a new deep learning algorithm for identifying optimal reaction coordinates (RC) in biomolecular reversible dynamics.
  • FMRC utilizes lumpability and decomposability principles reformulated into a conditional probability framework for efficient data-driven optimization.
  • While not explicitly learning the transfer operator or its eigenfunctions, FMRC encodes the dynamics of leading eigenfunctions into a low-dimensional RC space.
  • FMRC outperforms several state-of-the-art algorithms in constructing Markov state models (MSM) in biomolecular systems and demonstrates potential applications in enhanced sampling and MSM construction.

Read Full Article

like

17 Likes

source image

Arxiv

2h

read

59

img
dot

Image Credit: Arxiv

The FIX Benchmark: Extracting Features Interpretable to eXperts

  • Feature-based methods are commonly used to explain model predictions, but often assume interpretable features are readily available.
  • The FIX benchmark, called Features Interpretable to eXperts, aims to measure how well a collection of features aligns with expert knowledge.
  • FIXScore is proposed as a unified expert alignment measure applicable to diverse real-world settings across different domains and data modalities.
  • Popular feature-based explanation methods perform poorly in terms of alignment with expert-specified knowledge, signaling the need for better methods.

Read Full Article

like

3 Likes

source image

Arxiv

2h

read

85

img
dot

Image Credit: Arxiv

SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration

  • SageAttention is a highly efficient and accurate quantization method for attention in transformer architecture.
  • Attention has a computational complexity of O(N^2) and becomes the primary time-consuming component when handling large sequence lengths.
  • SageAttention outperforms FlashAttention2 and xformers in terms of operations per second (OPS) by about 2.1 times and 2.7 times, respectively.
  • Comprehensive experiments show that SageAttention incurs almost no end-to-end metrics loss across diverse models.

Read Full Article

like

5 Likes

source image

Arxiv

2h

read

16

img
dot

Image Credit: Arxiv

Horizon-Length Prediction: Advancing Fill-in-the-Middle Capabilities for Code Generation with Lookahead Planning

  • Fill-in-the-Middle (FIM) is a technique used in code language models to generate missing code given left and right contexts.
  • The current FIM training paradigm often leads to models struggling to generate content that aligns smoothly with the surrounding context.
  • To address this, the authors propose Horizon-Length Prediction (HLP), a novel training objective that teaches models to predict the number of remaining middle tokens at each step.
  • HLP significantly improves FIM performance by up to 24% relatively on diverse benchmarks, without resorting to unrealistic post-processing methods.

Read Full Article

like

Like

source image

Arxiv

2h

read

232

img
dot

Image Credit: Arxiv

Cluster-wise Graph Transformer with Dual-granularity Kernelized Attention

  • Cluster-wise Graph Transformer with Dual-granularity Kernelized Attention
  • The paper introduces the Node-to-Cluster Attention (N2C-Attn) mechanism for graph learning.
  • N2C-Attn incorporates techniques from Multiple Kernel Learning to capture information at both node and cluster levels.
  • The resulting architecture, Cluster-wise Graph Transformer (Cluster-GT), outperforms other methods on graph-level tasks.

Read Full Article

like

14 Likes

source image

Arxiv

2h

read

249

img
dot

Image Credit: Arxiv

Is Parameter Collision Hindering Continual Learning in LLMs?

  • Large Language Models (LLMs) often suffer from catastrophic forgetting when learning multiple tasks sequentially, making continual learning (CL) essential for their dynamic deployment.
  • Existing state-of-the-art (SOTA) methods focus on constructing orthogonality tasks to decouple parameter interdependence from various domains.
  • However, this paper suggests that building non-collision parameters is a more critical factor in addressing CL challenges.
  • The proposed approach, Non-collision Low-Rank Adaptation (N-LoRA), leverages low collision rates to enhance CL in LLMs with superior performance, higher task orthogonality, and lower parameter collision than SOTA methods.

Read Full Article

like

14 Likes

source image

Arxiv

2h

read

272

img
dot

Image Credit: Arxiv

HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing

  • Transformer-based large language models (LLMs) use the key-value (KV) cache to accelerate inference by storing past token embeddings.
  • HashEvict is an algorithm that uses locality-sensitive hashing (LSH) to compress the KV cache.
  • HashEvict quickly locates tokens in the cache that are cosine dissimilar to the current query token.
  • HashEvict can compress the KV cache by 30%-70% while maintaining high performance across various tasks.

Read Full Article

like

16 Likes

source image

Arxiv

2h

read

199

img
dot

Image Credit: Arxiv

The Road to Artificial SuperIntelligence: A Comprehensive Survey of Superalignment

  • The emergence of large language models (LLMs) has sparked the possibility of Artificial Superintelligence (ASI), a hypothetical AI system surpassing human intelligence.
  • Superalignment addresses the challenge of aligning AI systems with human values and safety requirements at superhuman levels of capability.
  • This survey examines scalable oversight methods and potential solutions for superalignment, including the concept of ASI, challenges, and limitations of current alignment paradigms.
  • The survey also discusses key challenges and proposes pathways for the safe and continual improvement of ASI systems.

Read Full Article

like

12 Likes

source image

Arxiv

2h

read

59

img
dot

Image Credit: Arxiv

Algorithm Design for Continual Learning in IoT Networks

  • Continual learning (CL) is a technique for maintaining a small forgetting loss on previously-learned tasks in an online learning setup.
  • Existing work focuses on reducing forgetting loss under a given task sequence, but fails to address the issue of huge forgetting loss on prior distinct tasks if similar tasks continuously appear.
  • In IoT networks, where an autonomous vehicle samples data and learns different tasks, the order of task patterns can be altered at an increased travelling cost.
  • Researchers have formulated a new optimization problem to study how to opportunistically route the testing object and alter the task sequence in CL, achieving close-to-optimum performance.

Read Full Article

like

3 Likes

For uninterrupted reading, download the app