menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

ML News

source image

Arxiv

4d

read

390

img
dot

Image Credit: Arxiv

DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster

  • Researchers have introduced DiLoCoX, a low-communication large-scale decentralized cluster training framework for distributed training of large language models.
  • DiLoCoX combines Pipeline Parallelism, Dual Optimizer Policy, One-Step-Delay Overlap of Communication, and Adaptive Gradient Compression Scheme to enhance scalability and speed of model pre-training.
  • The framework enables pre-training a 107B foundation model over a 1Gbps network, achieving a 357x speedup in distributed training compared to vanilla AllReduce with minimal impact on model convergence.
  • This marks the first successful application of a decentralized training framework to models exceeding 100 billion parameters.

Read Full Article

like

23 Likes

source image

Arxiv

4d

read

20

img
dot

Image Credit: Arxiv

Improved seeding strategies for k-means and k-GMM

  • Researchers revisit randomized seeding techniques for k-means clustering and k-GMM, introducing new families of initialization methods.
  • Experiments demonstrate constant factor improvements over traditional methods in terms of final metrics with modest overhead.
  • Significant insights are gained into properties of k-means algorithms, such as correlation observations and variance reduction phenomena.
  • The newly proposed seeding methods have the potential to become standard practices and open avenues for theoretical analysis.

Read Full Article

like

1 Like

source image

Arxiv

4d

read

93

img
dot

Image Credit: Arxiv

AGTCNet: A Graph-Temporal Approach for Principled Motor Imagery EEG Classification

  • A new study introduces AGTCNet, a graph-temporal model for motor imagery EEG classification in brain-computer interface technology.
  • AGTCNet leverages graph convolutional attention networks to capture spatiotemporal dependencies in EEG signals effectively.
  • The model outperformed existing classifiers, achieving state-of-the-art performance with reduced model size and faster inference time.
  • AGTCNet demonstrated high accuracies for subject-independent and subject-specific classifications on various EEG datasets, showcasing its practicality for BCI deployment.

Read Full Article

like

5 Likes

source image

Arxiv

4d

read

395

img
dot

Image Credit: Arxiv

DynamicBench: Evaluating Real-Time Report Generation in Large Language Models

  • DynamicBench is a new benchmark designed to evaluate the ability of large language models (LLMs) to store and process up-to-the-minute data for real-time information processing in applications.
  • The benchmark uses a dual-path retrieval pipeline combining web searches and local report databases, requiring domain-specific knowledge for accurate responses within specialized fields.
  • DynamicBench assesses LLMs in scenarios with or without external documents, measuring their capacity to autonomously process recent information or utilize contextual enhancements.
  • Experimental results show DynamicBench outperforming GPT4o by 7.0% in document-free scenarios and 5.8% in document-assisted scenarios, with a new report generation system managing dynamic information synthesis effectively.

Read Full Article

like

23 Likes

source image

Arxiv

4d

read

260

img
dot

Image Credit: Arxiv

SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning

  • Multimodal in-context learning (ICL) in the medical domain is explored in a new study, highlighting its potential for tasks requiring adaptation from limited examples.
  • SMMILE, a benchmark for medical tasks, was introduced by medical experts, consisting of 111 problems covering 6 specialties and 13 imaging modalities.
  • The study evaluated 15 multimodal large language models (MLLMs) on SMMILE, showing moderate to poor performance in multimodal ICL abilities.
  • ICL contributes only a slight improvement over zero-shot performance on SMMILE, with findings indicating susceptibility to irrelevant in-context examples and the impact of example ordering.

Read Full Article

like

15 Likes

source image

Arxiv

4d

read

24

img
dot

Image Credit: Arxiv

MAx-DNN: Multi-Level Arithmetic Approximation for Energy-Efficient DNN Hardware Accelerators

  • The paper titled 'MAx-DNN' explores the use of fine-grained error resilience and hardware approximation techniques for energy-efficient Deep Neural Network (DNN) computing.
  • It focuses on utilizing approximate multipliers distributed at different levels within the network to achieve higher energy efficiency with acceptable accuracy levels.
  • Experiments conducted on the ResNet-8 model using the CIFAR-10 dataset showed up to 54% energy gains at the cost of up to 4% accuracy loss compared to the baseline model, and 2x energy gains with improved accuracy compared to current DNN approximations.

Read Full Article

like

1 Like

source image

Arxiv

4d

read

12

img
dot

Image Credit: Arxiv

Pay Attention to Small Weights

  • Finetuning large pretrained neural networks can be resource-intensive in terms of memory and computational cost.
  • Researchers have observed a correlation between large gradients and small-magnitude weights during finetuning.
  • NANOADAM is proposed as a method that dynamically updates only small-magnitude weights during finetuning, offering practical advantages such as gradient-free determination of parameter subsets and better generalization performance in experiments.
  • The proposed method, NANOADAM, has shown benefits for both NLP and vision tasks by preserving large-magnitude weights encoding critical features from pretraining and enabling the use of larger learning rates.

Read Full Article

like

Like

source image

Arxiv

4d

read

301

img
dot

Image Credit: Arxiv

Temporal-Aware Graph Attention Network for Cryptocurrency Transaction Fraud Detection

  • A new research paper introduces an Augmented Temporal-aware Graph Attention Network (ATGAT) for detecting cryptocurrency transaction fraud.
  • ATGAT aims to address the complexities and class imbalance in fraudulent transaction detection through advanced temporal embedding, temporal-aware triple attention mechanism, and weighted BCE loss for class imbalance.
  • Experiments on the Elliptic++ cryptocurrency dataset show that ATGAT achieves an AUC of 0.9130, outperforming traditional methods like XGBoost, GCN, and standard GAT in fraud detection.
  • The research highlights the effectiveness of temporal awareness and triple attention mechanisms in enhancing graph neural networks for fraud detection, offering more reliable tools for financial institutions.

Read Full Article

like

18 Likes

source image

Arxiv

4d

read

223

img
dot

Image Credit: Arxiv

Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference

  • Large language models (LLMs) are popular but can provide incorrect information and lack calibration, especially in critical sectors like autonomy and healthcare.
  • A new approach called ScalaBL is introduced to address uncertainty quantification in LLMs by performing Bayesian inference in a low-rank adaptation subspace.
  • ScalaBL repurposes low-rank adaptation parameters as projection matrices, enabling learning of all parameters using stochastic variational inference.
  • Despite its low dimensionality, ScalaBL achieves competitive performance with minimal additional parameters and can scale to the largest Bayesian LLMs with significantly more base parameters.

Read Full Article

like

13 Likes

source image

Arxiv

4d

read

127

img
dot

Image Credit: Arxiv

Distributed Cross-Channel Hierarchical Aggregation for Foundation Models

  • Research introduces the Distributed Cross-Channel Hierarchical Aggregation (D-CHAG) approach for vision-based scientific foundation models.
  • D-CHAG is designed to handle datasets with a large number of channels across image modalities and improve computational efficiency.
  • The approach was tested on hyperspectral imaging and weather forecasting tasks, showing significant memory reduction and increased throughput.
  • The study integrated D-CHAG with tensor parallelism and model sharding, achieving promising results on the Frontier Supercomputer.

Read Full Article

like

7 Likes

source image

Arxiv

4d

read

383

img
dot

Image Credit: Arxiv

Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning

  • Generative models like diffusion and flow-matching provide expressive policies for offline reinforcement learning.
  • A new approach called Single-Step Completion Policy (SSCP) is introduced to enhance generative policy training by predicting direct completion vectors, enabling accurate one-shot action generation.
  • SSCP combines the richness of generative models with the efficiency of unimodal policies, offering improved training and inference speed without the need for long backpropagation chains.
  • SSCP not only performs well in standard offline RL and behavior cloning benchmarks but also supports goal-conditioned RL, making it a versatile and efficient framework for deep RL and sequential decision-making.

Read Full Article

like

23 Likes

source image

Arxiv

4d

read

374

img
dot

Image Credit: Arxiv

Deception Detection in Dyadic Exchanges Using Multimodal Machine Learning: A Study on a Swedish Cohort

  • This study examines the use of multimodal machine learning to detect deception in dyadic interactions, integrating data from both deceivers and deceived.
  • The study compared early and late fusion approaches using audio and video data, specifically focusing on Action Units and gaze information.
  • Results show that combining speech and facial data enhances deception detection accuracy, with the best performance (71%) achieved through late fusion across modalities and participants.
  • The research on a Swedish cohort suggests that including data from both participants improves detection accuracy and lays the groundwork for future studies in dyadic interactions, especially in psychotherapy settings.

Read Full Article

like

22 Likes

source image

Arxiv

4d

read

294

img
dot

Image Credit: Arxiv

Optimising 4th-Order Runge-Kutta Methods: A Dynamic Heuristic Approach for Efficiency and Low Storage

  • Extended Stability Runge-Kutta (ESRK) methods are essential for large-scale computational problems in various fields but optimizing them for efficiency and low storage is challenging.
  • A hybrid Genetic Algorithm (GA) and Reinforcement Learning (RL) approach is proposed to automate heuristic discovery for optimizing low-storage ESRK methods.
  • The new approach combines GA-driven mutations for search-space exploration and an RL-inspired state transition mechanism for heuristic selection, leading to a 25% reduction in runtime while maintaining stability and accuracy.
  • The study validates the proposed heuristic optimisation framework on benchmark problems, showcasing its potential to improve resource efficiency in high-fidelity simulations using low-storage Runge-Kutta methods.

Read Full Article

like

17 Likes

source image

Arxiv

4d

read

8

img
dot

Image Credit: Arxiv

Devising a solution to the problems of Cancer awareness in Telangana

  • In Telangana, a study reveals low screening rates for cervical, breast, and oral cancer in 2020.
  • To address this, an ML classification model has been developed to predict cancer susceptibility based on demographic factors.
  • A system was created to recommend nearby hospitals or cancer treatment centers based on user location and integrate health cards for maintaining medical records.
  • The aim is to increase cancer awareness, reduce mortality, and improve cancer literacy in Telangana through targeted campaigns and using machine learning algorithms.

Read Full Article

like

Like

source image

Arxiv

4d

read

0

img
dot

Image Credit: Arxiv

Process mining-driven modeling and simulation to enhance fault diagnosis in cyber-physical systems

  • A novel unsupervised fault diagnosis methodology has been developed to enhance fault diagnosis in Cyber-Physical Systems (CPSs) by integrating collective anomaly detection, process mining, and stochastic simulation.
  • The methodology starts by detecting collective anomalies in sensor data through multivariate time-series analysis, then transforms them into structured event logs for process mining to create interpretable process models.
  • Incorporating timing distributions into the extracted Petri nets allows for stochastic simulation of faulty behaviors, improving root cause analysis and behavioral understanding in CPSs.
  • Experimental validation using the Robotic Arm Dataset showed the methodology's effectiveness in modeling, simulating, and classifying faulty behaviors, facilitating the development of fault dictionaries for predictive maintenance in industrial settings.

Read Full Article

like

Like

For uninterrupted reading, download the app