menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

ML News

source image

Arxiv

3d

read

260

img
dot

Image Credit: Arxiv

Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality

  • Researchers propose a new evaluation metric called Approximate Feature Activation (AFA) for assessing alignment between inputs and activations in Sparse Autoencoders (SAEs).
  • The study introduces a novel SAE architecture called top-AFA SAE, which eliminates the need to tune SAE sparsity hyperparameters.
  • The top-AFA SAEs achieve reconstruction loss comparable to state-of-the-art top-k SAEs without requiring the hyperparameter k to be tuned.
  • The proposed method also introduces the ZF plot, revealing a relationship between large language model hidden embeddings and SAE feature vectors.

Read Full Article

like

15 Likes

source image

Arxiv

3d

read

372

img
dot

Image Credit: Arxiv

Value of Information-based Deceptive Path Planning Under Adversarial Interventions

  • Existing methods for deceptive path planning (DPP) do not address the problem of adversarial interventions.
  • A novel Markov decision process (MDP)-based model is proposed for DPP under adversarial interventions.
  • New value of information (VoI) objectives are developed to guide DPP policy design.
  • Computationally efficient methods are derived for synthesizing policies for DPP under adversarial interventions.

Read Full Article

like

22 Likes

source image

Arxiv

3d

read

28

img
dot

Image Credit: Arxiv

Evaluating machine learning models for predicting pesticides toxicity to honey bees

  • Small molecules play a critical role in the biomedical, environmental, and agrochemical domains.
  • This work focuses on ApisTox, the most comprehensive dataset of experimentally validated chemical toxicity to the honey bee (Apis mellifera).
  • The evaluation of ApisTox using various machine learning approaches reveals that it represents a distinct chemical space.
  • The limited generalizability of current state-of-the-art algorithms trained solely on biomedical data highlights the need for targeted model development in the agrochemical domain.

Read Full Article

like

1 Like

source image

Arxiv

3d

read

264

img
dot

Image Credit: Arxiv

NoProp: Training Neural Networks without Back-propagation or Forward-propagation

  • The paper introduces a new learning method named NoProp, which does not rely on either forward or backward propagation in deep learning.
  • NoProp takes inspiration from diffusion and flow matching methods to independently learn to denoise a noisy target at each layer.
  • The method demonstrates superior accuracy, ease of use, and computational efficiency compared to other back-propagation-free methods on image classification benchmarks such as MNIST, CIFAR-10, and CIFAR-100.
  • NoProp alters the traditional gradient-based learning paradigm, enabling more efficient distributed learning and potentially impacting other characteristics of the learning process.

Read Full Article

like

15 Likes

source image

Arxiv

3d

read

244

img
dot

Image Credit: Arxiv

ORAL: Prompting Your Large-Scale LoRAs via Conditional Recurrent Diffusion

  • Parameter generation has emerged as a novel paradigm for neural network development, offering an alternative to traditional neural network training by synthesizing high-quality model weights directly.
  • In this paper, a novel conditional recurrent diffusion framework called ORAL is introduced, which addresses the limitations of existing methods in achieving scalability and controllability.
  • ORAL incorporates a novel conditioning mechanism to generate task-specific Low-Rank Adaptation (LoRA) parameters that can seamlessly transfer across evolving language models.
  • Extensive experiments show that ORAL generates high-quality LoRA parameters, achieving comparable or superior performance to vanilla trained counterparts across various language, vision, and multimodal tasks.

Read Full Article

like

14 Likes

source image

Arxiv

3d

read

260

img
dot

Image Credit: Arxiv

SQuat: Subspace-orthogonal KV Cache Quantization

  • Researchers propose SQuat (Subspace-orthogonal KV cache quantization) to reduce memory usage in key-value (KV) cache used for LLMs decoding.
  • SQuat constructs a subspace spanned by query tensors to capture critical task-related information.
  • SQuat enforces orthogonality between (de)quantized and original keys in the subspace, minimizing the impact of quantization errors.
  • The method achieves reduced memory usage, improved throughput, and better benchmark scores compared to existing KV cache quantization algorithms.

Read Full Article

like

15 Likes

source image

Arxiv

3d

read

372

img
dot

Image Credit: Arxiv

Which LIME should I trust? Concepts, Challenges, and Solutions

  • Explainable Artificial Intelligence (XAI) is crucial for fostering trust and detecting potential misbehavior of opaque models.
  • LIME (Local Interpretable Model-agnostic Explanations) is a popular model-agnostic approach for generating explanations of black-box models.
  • LIME faces challenges related to fidelity, stability, and applicability to domain-specific problems.
  • A survey has been conducted to comprehensively explore and collect LIME's foundational concepts and known limitations, categorize and compare its enhancements, and offer a structured taxonomy for future research and practical application.

Read Full Article

like

22 Likes

source image

Arxiv

3d

read

76

img
dot

Image Credit: Arxiv

Effectively Controlling Reasoning Models through Thinking Intervention

  • Reasoning-enhanced large language models (LLMs) generate intermediate reasoning steps prior to generating final answers, excelling in complex problem-solving.
  • Thinking Intervention is a novel paradigm designed to guide the internal reasoning processes of LLMs by strategically inserting or revising specific thinking tokens.
  • Comprehensive evaluations show that Thinking Intervention outperforms baseline prompting approaches, achieving significant improvements in instruction following, instruction hierarchy, and safety alignment tasks.
  • The research on Thinking Intervention offers a promising new avenue for controlling reasoning LLMs.

Read Full Article

like

4 Likes

source image

Arxiv

3d

read

152

img
dot

Image Credit: Arxiv

Prompt, Divide, and Conquer: Bypassing Large Language Model Safety Filters via Segmented and Distributed Prompt Processing

  • Researchers have developed a framework to bypass safety filters of large language models (LLMs) and generate malicious code.
  • The framework employs distributed prompt processing and iterative refinements to achieve a 73.2% success rate (SR) in generating malicious code.
  • Comparative analysis shows that traditional single-LLM judge evaluation overestimates SRs compared to the LLM jury system.
  • The distributed architecture improves SRs by 12% compared to the non-distributed approach.

Read Full Article

like

9 Likes

source image

Arxiv

3d

read

228

img
dot

Image Credit: Arxiv

Truth in Text: A Meta-Analysis of ML-Based Cyber Information Influence Detection Approaches

  • Cyber information influence, or disinformation, is a significant threat to social progress and government stability.
  • ML techniques, including traditional ML algorithms and deep learning models, are being used to detect disinformation in online media.
  • A two-stage meta-analysis was conducted to assess the effectiveness of ML models in detecting disinformation.
  • The majority of the ML detection techniques sampled achieved over 80% accuracy, with a mean sample effectiveness of 79.18% accuracy.

Read Full Article

like

13 Likes

source image

Arxiv

3d

read

336

img
dot

Image Credit: Arxiv

Enhancing Aviation Communication Transcription: Fine-Tuning Distil-Whisper with LoRA

  • Enhancing Aviation Communication Transcription: Fine-Tuning Distil-Whisper with LoRA
  • The paper discusses the use of a Parameter-Efficient Fine-tuning method called Low-Rank Adaptation (LoRA) to fine-tune a more computationally efficient version of the automatic speech recognition model, Whisper, for aviation communication transcription.
  • The authors used the Air Traffic Control Corpus dataset and performed a grid search to optimize the hyperparameters of distil-Whisper using a 5-fold cross-validation.
  • The fine-tuned model achieved an average word error rate of 3.86% across five folds, indicating its potential for accurate transcription of aviation communication.

Read Full Article

like

20 Likes

source image

Arxiv

3d

read

384

img
dot

Image Credit: Arxiv

Modeling speech emotion with label variance and analyzing performance across speakers and unseen acoustic conditions

  • Spontaneous speech emotion data often have uncertainty in labels due to grader opinion variation.
  • Using the probability density function of emotion grades as targets instead of consensus grades improves performance on benchmark evaluation sets.
  • Saliency-driven foundation model representation selection helps train a state-of-the-art speech emotion model for both dimensional and categorical emotion recognition.
  • Performance evaluation across multiple test-sets, along with analysis across gender and speakers, is necessary to assess the usefulness of emotion models.

Read Full Article

like

23 Likes

source image

Arxiv

3d

read

388

img
dot

Image Credit: Arxiv

Risk-Calibrated Affective Speech Recognition via Conformal Coverage Guarantees: A Stochastic Calibrative Framework for Emergent Uncertainty Quantification

  • Traffic safety challenges arising from extreme driver emotions highlight the urgent need for reliable emotion recognition systems.
  • Traditional deep learning approaches in speech emotion recognition suffer from overfitting and poorly calibrated confidence estimates.
  • A framework integrating Conformal Prediction (CP) and Risk Control is proposed, using Mel-spectrogram features processed through a pre-trained convolutional neural network.
  • The Risk Control framework enables task-specific adaptation through customizable loss functions, dynamically adjusting prediction set sizes while maintaining coverage guarantees.

Read Full Article

like

23 Likes

source image

Arxiv

3d

read

392

img
dot

Image Credit: Arxiv

Chirp Localization via Fine-Tuned Transformer Model: A Proof-of-Concept Study

  • Researchers have developed a fine-tuned Transformer model to detect and localize chirp-like patterns in EEG spectrograms, which are important biomarkers for seizure dynamics.
  • The study utilized synthetic spectrograms with chirp parameters to create a benchmark for chirp localization.
  • The Vision Transformer (ViT) model was adapted for regression to predict chirp parameters, and attention layers were fine-tuned using Low-Rank Adaptation (LoRA).
  • The model achieved a strong alignment between predicted and actual labels, with a correlation of 0.9841 for chirp start time.

Read Full Article

like

23 Likes

source image

Arxiv

3d

read

132

img
dot

Image Credit: Arxiv

A Large-Scale Vision-Language Dataset Derived from Open Scientific Literature to Advance Biomedical Generalist AI

  • A large-scale vision-language dataset derived from open scientific literature, Biomedica, has been introduced to advance biomedical generalist AI.
  • The dataset contains over 6 million scientific articles, 24 million image-text pairs, and 27 metadata fields, including expert human annotations.
  • Scalable streaming and search APIs are provided for easy access to the dataset, facilitating seamless integration with AI systems.
  • The utility of the Biomedica dataset has been demonstrated through the development of embedding models, chat-style models, and retrieval-augmented chat agents, outperforming previous open systems.

Read Full Article

like

7 Likes

For uninterrupted reading, download the app