menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

ML News

source image

Arxiv

2d

read

74

img
dot

Image Credit: Arxiv

Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving

  • Key-Value cache ( exttt{KV} exttt{cache}) compression has emerged as a promising technique to optimize Large Language Model (LLM) serving.
  • The paper comprehensively reviews existing algorithmic designs and benchmark studies, identifying missing performance measurement aspects that hinder practical adoption.
  • Representative exttt{KV} exttt{cache} compression methods are evaluated, uncovering issues that affect computational efficiency and end-to-end latency.
  • Tools are provided to aid future exttt{KV} exttt{cache} compression studies and facilitate practical deployment in production.

Read Full Article

like

4 Likes

source image

Arxiv

2d

read

101

img
dot

Image Credit: Arxiv

CITRAS: Covariate-Informed Transformer for Time Series Forecasting

  • CITRAS is a patch-based Transformer that addresses challenges in covariate-informed time series forecasting.
  • It leverages multiple targets and covariates, considering both past and future forecasting horizons.
  • CITRAS introduces two novel mechanisms: Key-Value (KV) Shift and Attention Score Smoothing.
  • Experimental results show that CITRAS achieves state-of-the-art performance in both covariate-informed and multivariate forecasting.

Read Full Article

like

6 Likes

source image

Arxiv

2d

read

219

img
dot

Image Credit: Arxiv

Bayesian Predictive Coding

  • Bayesian Predictive Coding (BPC) is a Bayesian extension to the influential theory of Predictive Coding (PC) in information processing in the brain.
  • BPC estimates a posterior distribution over network parameters, allowing for better quantification of epistemic uncertainty.
  • Compared to PC, BPC converges in fewer epochs in the full-batch setting and remains competitive in the mini-batch setting.
  • BPC provides a biologically plausible method for Bayesian learning in the brain and offers attractive uncertainty quantification in deep learning.

Read Full Article

like

13 Likes

source image

Arxiv

2d

read

297

img
dot

Image Credit: Arxiv

Accelerated Airfoil Design Using Neural Network Approaches

  • This paper demonstrates the use of Convolutional Neural Networks (CNNs) and Deep Neural Networks (DNNs) to predict airfoil shapes from targeted pressure distribution and vice versa.
  • The dataset used in this study consists of 1600 airfoil shapes simulated at various Reynolds numbers and angles of attack.
  • The refined models show improved efficiency and reduced training time compared to the CNN model for complex datasets.
  • The proposed CNN and DNN models show promising results and have the potential to accelerate aerodynamic optimization and design of high-performance airfoils.

Read Full Article

like

17 Likes

source image

Arxiv

2d

read

46

img
dot

Image Credit: Arxiv

TransMamba: Flexibly Switching between Transformer and Mamba

  • TransMamba is a framework that combines Transformer and Mamba models for efficient long-sequence processing.
  • TransMamba uses shared parameter matrices to switch between attention and state space model (SSM) mechanisms.
  • The framework includes a Memory converter to bridge Transformer and Mamba models for seamless information flow.
  • Experimental results demonstrate that TransMamba achieves superior training efficiency and performance compared to baselines.

Read Full Article

like

2 Likes

source image

Arxiv

2d

read

27

img
dot

Image Credit: Arxiv

Level the Level: Balancing Game Levels for Asymmetric Player Archetypes With Reinforcement Learning

  • This work focuses on generating balanced levels tailored to asymmetric player archetypes in games.
  • The goal is to balance the disparity in abilities through the level design.
  • A method using reinforcement learning is used to balance tile-based game levels.
  • The evaluation shows that the method can balance a larger proportion of levels compared to two baseline approaches.

Read Full Article

like

1 Like

source image

Arxiv

2d

read

176

img
dot

Image Credit: Arxiv

CTSketch: Compositional Tensor Sketching for Scalable Neurosymbolic Learning

  • CTSketch is a novel, scalable neurosymbolic learning algorithm for training neural networks using end-to-end input-output labels.
  • CTSketch decomposes the symbolic program into sub-programs and summarizes each sub-program with a sketched tensor to improve scalability.
  • The algorithm approximates the output distribution of the program using simple tensor operations over input distributions and summaries.
  • CTSketch achieves high accuracy on tasks involving over one thousand inputs, pushing neurosymbolic learning to new scales.

Read Full Article

like

10 Likes

source image

Arxiv

2d

read

137

img
dot

Image Credit: Arxiv

Learning a Canonical Basis of Human Preferences from Binary Ratings

  • Recent advances in generative AI have been driven by alignment techniques such as reinforcement learning from human feedback (RLHF).
  • This paper focuses on understanding the preferences encoded in datasets used for RLHF and identifying common human preferences.
  • A small subset of 21 preference categories captures over 89% of preference variation across individuals, serving as a canonical basis of human preferences.
  • The identified preference basis proves useful for model evaluation and training, offering insights into model alignment and successful fine-tuning.

Read Full Article

like

8 Likes

source image

Arxiv

2d

read

277

img
dot

Image Credit: Arxiv

Predicting Targeted Therapy Resistance in Non-Small Cell Lung Cancer Using Multimodal Machine Learning

  • Lung cancer is the primary cause of cancer death globally, with non-small cell lung cancer (NSCLC) being the most common subtype.
  • A new study has developed a multimodal machine learning model to predict patient resistance to osimertinib, a third-generation EGFR-tyrosine kinase inhibitor, in late-stage NSCLC patients with activating EGFR mutations.
  • The model achieved a c-index of 0.82 on a multi-institutional dataset by integrating various data types such as histology images, next-generation sequencing (NGS) data, demographics data, and clinical records.
  • The multimodal model demonstrated superior performance over single modality models, highlighting the importance of combining multiple data types for accurate patient outcome prediction.

Read Full Article

like

16 Likes

source image

Arxiv

2d

read

121

img
dot

Image Credit: Arxiv

Ride-Sourcing Vehicle Rebalancing with Service Accessibility Guarantees via Constrained Mean-Field Reinforcement Learning

  • The rapid expansion of ride-sourcing services presents operational challenges, such as vehicle rebalancing.
  • A scalable mean-field control and reinforcement learning model is proposed for precise vehicle repositioning.
  • An accessibility constraint is integrated to ensure equitable service distribution.
  • Empirical evaluation using real-world data-driven simulation demonstrates the efficiency and robustness of the approach.

Read Full Article

like

7 Likes

source image

Arxiv

2d

read

176

img
dot

Image Credit: Arxiv

Many-to-Many Matching via Sparsity Controlled Optimal Transport

  • Many-to-many matching seeks to match multiple points in one set and multiple points in another set.
  • This paper proposes a novel many-to-many matching method that explicitly encodes many-to-many constraints while preventing one-to-one matching.
  • The method includes matching budget constraints and a deformed $q$-entropy regularization to maximize the matching budget.
  • Experimental results show that the proposed method achieves good performance in generating meaningful many-to-many matchings.

Read Full Article

like

10 Likes

source image

Arxiv

2d

read

160

img
dot

Image Credit: Arxiv

Spatio-temporal Prediction of Fine-Grained Origin-Destination Matrices with Applications in Ridesharing

  • Accurate spatial-temporal prediction of network-based travelers' requests is crucial for the effective policy design of ridesharing platforms.
  • This paper introduces a novel prediction model, OD-CED, for fine-grained Origin-Destination (OD) demand prediction in ridesharing platforms.
  • OD-CED combines an unsupervised space coarsening technique and an encoder-decoder architecture to capture both semantic and geographic dependencies.
  • Experimental results show that OD-CED outperforms traditional statistical methods, achieving significant reductions in root-mean-square error and weighted mean absolute percentage error.

Read Full Article

like

9 Likes

source image

Arxiv

2d

read

19

img
dot

Image Credit: Arxiv

Advances in Continual Graph Learning for Anti-Money Laundering Systems: A Comprehensive Review

  • Financial institutions are required to monitor vast amounts of transactions for money laundering.
  • Traditional machine learning models have limitations in adapting to dynamic environments for AML detection.
  • Continual graph learning approaches can enhance AML practices by incorporating new information while retaining prior knowledge.
  • Experimental evaluations show that continual learning improves model adaptability and robustness in detecting money laundering.

Read Full Article

like

1 Like

source image

Arxiv

2d

read

254

img
dot

Image Credit: Arxiv

Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality

  • Researchers propose a new evaluation metric called Approximate Feature Activation (AFA) for assessing alignment between inputs and activations in Sparse Autoencoders (SAEs).
  • The study introduces a novel SAE architecture called top-AFA SAE, which eliminates the need to tune SAE sparsity hyperparameters.
  • The top-AFA SAEs achieve reconstruction loss comparable to state-of-the-art top-k SAEs without requiring the hyperparameter k to be tuned.
  • The proposed method also introduces the ZF plot, revealing a relationship between large language model hidden embeddings and SAE feature vectors.

Read Full Article

like

15 Likes

source image

Arxiv

2d

read

364

img
dot

Image Credit: Arxiv

Value of Information-based Deceptive Path Planning Under Adversarial Interventions

  • Existing methods for deceptive path planning (DPP) do not address the problem of adversarial interventions.
  • A novel Markov decision process (MDP)-based model is proposed for DPP under adversarial interventions.
  • New value of information (VoI) objectives are developed to guide DPP policy design.
  • Computationally efficient methods are derived for synthesizing policies for DPP under adversarial interventions.

Read Full Article

like

21 Likes

For uninterrupted reading, download the app