menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

ML News

source image

Arxiv

4d

read

361

img
dot

Image Credit: Arxiv

JoFormer (Journey-based Transformer): Theory and Empirical Analysis on the Tiny Shakespeare Dataset

  • JoFormer is a journey-based Transformer architecture that incorporates positional information through learnable directional transforms.
  • It represents relative positions using sequentially composed directional transforms, outperforming the RoFormer baseline on the Tiny Shakespeare character-level language modeling task.
  • JoFormer achieves lower perplexity and faster convergence, showcasing the benefits of its more expressive treatment of positional relationships.
  • The per-token JoFormer, despite being a conceptual variant, demonstrates strong performance, hinting at its potential for more complex architectures.

Read Full Article

like

21 Likes

source image

Arxiv

4d

read

349

img
dot

Image Credit: Arxiv

When Simple Model Just Works: Is Network Traffic Classification in Crisis?

  • Machine learning has been used for network traffic classification for over two decades.
  • Recent findings suggest that a simple k-NN baseline using packet sequences metadata can perform as well as or even better than complex neural networks.
  • Analysis reveals that many datasets contain over 50% redundant samples, impacting model performance and accuracy estimation.
  • The study suggests that standard machine learning practices may not be suitable for network traffic classification and proposes new directions for evaluation in the field.

Read Full Article

like

21 Likes

source image

Arxiv

4d

read

244

img
dot

Image Credit: Arxiv

Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework for Dependency, Asynchrony, and Missingness

  • Real-world time series data are inherently multivariate, often exhibiting complex inter-channel dependencies.
  • Proposed ChannelTokenFormer is a Transformer-based forecasting model that addresses challenges like channel dependency, asynchrony, and missingness in real-world scenarios.
  • The model is designed to capture cross-channel interactions, accommodate channel-wise asynchronous sampling, and handle missing values effectively.
  • Experiments on benchmark datasets and a real-world industrial dataset show that ChannelTokenFormer outperforms existing architectures in robustness and accuracy under challenging conditions.

Read Full Article

like

14 Likes

source image

Arxiv

4d

read

235

img
dot

Image Credit: Arxiv

Optimizing Learned Image Compression on Scalar and Entropy-Constraint Quantization

  • Continuous improvements in image compression with variational autoencoders have led to competitive learned codecs.
  • Quantization during the training process poses challenges due to zero derivatives, requiring differentiable approximations for optimization.
  • Proposed method involves retraining parts of the network on quantized latents post end-to-end training for improved accuracy in modeling quantization noise.
  • Results show additional coding gain for both uniform scalar and entropy-constraint quantization without increasing complexity, with average savings up to 2.2% in bitrate.

Read Full Article

like

14 Likes

source image

Arxiv

4d

read

393

img
dot

Image Credit: Arxiv

Mitigating Reward Over-optimization in Direct Alignment Algorithms with Importance Sampling

  • Direct Alignment Algorithms (DAAs) like Direct Preference Optimization (DPO) are being used as alternatives to Reinforcement Learning from Human Feedback for aligning large language models with human values.
  • These methods are prone to over-optimization, causing the model to deviate from the reference policy, leading to decreased performance during training.
  • A new approach called Importance-Sampling DAAs (IS-DAAs) is introduced to address the over-optimization issue in offline DAAs by multiplying the objective function with an importance ratio based on the reference policy distribution.
  • Experiments show that IS-DAAs effectively mitigate over-training concerns, particularly with low regularization strength, outperforming other methods targeting this problem.

Read Full Article

like

23 Likes

source image

Arxiv

4d

read

64

img
dot

Image Credit: Arxiv

Breaking the ICE: Exploring promises and challenges of benchmarks for Inference Carbon & Energy estimation for LLMs

  • Generative AI, particularly Large Language Models (LLMs), have been found to strain energy grids and the environment, posing a challenge to sustainability goals.
  • Current tools for monitoring and estimating energy consumption have limitations such as high input data requirements and high error margins.
  • A new framework, R-ICE, proposes using LLM benchmarks to estimate inference carbon emissions accurately and non-intrusively, enabling various emerging use-cases like dynamic LLM routing and carbon accounting.
  • The validation results of the framework show promise, indicating the potential of benchmark-based modeling for inference emission estimation, encouraging further exploration in the scientific community.

Read Full Article

like

3 Likes

source image

Arxiv

4d

read

247

img
dot

Image Credit: Arxiv

Urban Incident Prediction with Graph Neural Networks: Integrating Government Ratings and Crowdsourced Reports

  • Graph neural networks (GNNs) are utilized for urban spatiotemporal forecasting to predict infrastructure problems like potholes or rodent issues.
  • A multiview, multioutput GNN-based model is proposed to integrate government inspection ratings and crowdsourced reports for predicting the true latent state of incidents in neighborhoods.
  • A dataset of 9,615,863 crowdsourced reports and 1,041,415 government inspection ratings over 3 years in New York City across 139 types of incidents is collected, standardized, and made publicly available.
  • The model shows improved prediction of latent states by combining rating data with reporting data, especially in scenarios with sparse rating data and predictive reports, while highlighting demographic biases in crowdsourced reporting.

Read Full Article

like

14 Likes

source image

Arxiv

4d

read

389

img
dot

Image Credit: Arxiv

IMAGIC-500: IMputation benchmark on A Generative Imaginary Country (500k samples)

  • Missing data imputation in tabular datasets is a challenge in data science and machine learning, especially in socioeconomic research.
  • Strict data protection protocols limit the sharing of real-world socioeconomic datasets, hindering reproducibility and benchmark studies.
  • Researchers created the IMAGIC-500 dataset using the World Bank's synthetic dataset to evaluate missing data imputation methods on socioeconomic features.
  • The benchmark assesses imputation accuracy for various missing mechanisms and ratios, aiming to advance the development of robust imputation algorithms in social science research.

Read Full Article

like

23 Likes

source image

Arxiv

4d

read

280

img
dot

Image Credit: Arxiv

Agile Reinforcement Learning for Real-Time Task Scheduling in Edge Computing

  • Soft real-time applications in edge computing pose challenges for task scheduling while meeting timing constraints.
  • Schedulers based on heuristic algorithms struggle to adapt to dynamic edge computing environments.
  • Agile Reinforcement Learning (aRL) proposed for task scheduling in edge computing enhances predictability and adaptability of RL-agent.
  • Experiments show that aRL achieves a higher hit-ratio and converges faster compared to baseline approaches.

Read Full Article

like

16 Likes

source image

Arxiv

4d

read

24

img
dot

Image Credit: Arxiv

Adapting to Heterophilic Graph Data with Structure-Guided Neighbor Discovery

  • Graph Neural Networks (GNNs) struggle with heterophilic data, where connected nodes may have dissimilar labels, due to homophily assumption and local message passing.
  • A new approach proposes creating alternative graph structures by linking nodes with similar structural attributes to improve label homophily.
  • Theoretical proof suggests GNN performance improvement by utilizing graphs with fewer false positive edges and considering multiple graph views.
  • Structure-Guided GNN (SG-GNN) is introduced as an architecture that processes original and newly created structural graphs to achieve state-of-the-art or highly competitive performance on datasets with heterophilic characteristics.

Read Full Article

like

1 Like

source image

Arxiv

4d

read

292

img
dot

Image Credit: Arxiv

InfoDPCCA: Information-Theoretic Dynamic Probabilistic Canonical Correlation Analysis

  • InfoDPCCA is a dynamic probabilistic Canonical Correlation Analysis (CCA) framework designed for extracting meaningful latent representations from high-dimensional sequential data in machine learning.
  • It leverages an information-theoretic objective to extract a shared latent representation capturing the mutual structure between data streams as well as separate latent components specific to each sequence.
  • Unlike previous dynamic CCA models, InfoDPCCA ensures the shared latent space encodes only the mutual information between sequences, enhancing interpretability and robustness.
  • Experiments on synthetic and medical fMRI data show that InfoDPCCA is proficient in representation learning, with code available at https://github.com/marcusstang/InfoDPCCA.

Read Full Article

like

17 Likes

source image

Arxiv

4d

read

247

img
dot

Image Credit: Arxiv

Intention-Conditioned Flow Occupancy Models

  • Large-scale pre-training in machine learning research has enabled the use of foundation models for adapting and fine-tuning to specific tasks, similar framework is now being applied to reinforcement learning for addressing core challenges such as sample efficiency and robustness.
  • A probabilistic model called intention-conditioned flow occupancy models (InFOM) has been developed in this context to predict which states an agent will visit in the future by incorporating flow matching and latent variables capturing user intention, leading to improved returns and success rates in benchmark tasks.
  • The InFOM method outperformed alternative pre-training methods in experiments conducted on 36 state-based and 4 image-based benchmark tasks, achieving a 1.8 times median improvement in returns and increasing success rates by 36%.
  • More details can be found on the website https://chongyi-zheng.github.io/infom and the code is available at https://github.com/chongyi-zheng/infom

Read Full Article

like

14 Likes

source image

Arxiv

4d

read

105

img
dot

Image Credit: Arxiv

Enhancing generalizability of model discovery across parameter space with multi-experiment equation learning (ME-EQL)

  • Agent-based modeling (ABM) is computationally intensive and not analytically tractable for understanding self-organizing biological systems.
  • Equation learning (EQL) methods can derive continuum models from ABM data, but concerns about generalizability arise due to the need for extensive simulations for each parameter set.
  • Multi-experiment equation learning (ME-EQL) introduces two methods - one-at-a-time ME-EQL (OAT ME-EQL) and embedded structure ME-EQL (ES ME-EQL) to enhance generalizability across parameter space.
  • Demonstrated using birth-death mean-field and on-lattice agent-based models, ME-EQL methods reduce relative error in recovering parameters from agent-based simulations, with OAT ME-EQL showing better generalizability.

Read Full Article

like

6 Likes

source image

Arxiv

4d

read

377

img
dot

Image Credit: Arxiv

Local MDI+: Local Feature Importances for Tree-Based Models

  • Tree-based models like random forests are favored over deep learning for tabular data due to their prediction performance and efficiency.
  • Local Feature Importances (LFI) methods such as LIME and TreeSHAP provide sample-specific explanations but have limitations.
  • MDI+ is a global feature importance method but lacks explanations for predictions with diverse individual characteristics.
  • To address this, Local MDI+ (LMDI+) has been introduced, outperforming LIME and TreeSHAP in identifying instance-specific signal features and enhancing interpretability.

Read Full Article

like

22 Likes

source image

Arxiv

4d

read

247

img
dot

Image Credit: Arxiv

KARMA: A Multilevel Decomposition Hybrid Mamba Framework for Multivariate Long-Term Time Series Forecasting

  • A new framework called KARMA has been developed for multivariate long-term time series forecasting.
  • KARMA uses an Adaptive Time Channel Decomposition module (ATCD) and a Hybrid Frequency-Time Decomposition module (HFTD) to extract trend and seasonal components.
  • The framework integrates a multi-scale Mamba-based KarmaBlock to process global and local information efficiently.
  • Experiments on real-world datasets show that KARMA outperforms mainstream baseline methods in predictive accuracy and computational efficiency.

Read Full Article

like

14 Likes

For uninterrupted reading, download the app