menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

ML News

source image

Arxiv

3d

read

20

img
dot

Image Credit: Arxiv

HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation

  • Existing 2D-to-3D human pose estimation methods struggle with occlusion issue due to limited input representation.
  • Hierarchical Pose AutoRegressive Transformer (HiPART) is proposed to address the occlusion issue in 2D-to-3D lifting.
  • HiPART generates hierarchical 2D dense poses from sparse 2D pose using a two-stage generative densification method.
  • HiPART achieves state-of-the-art performance on single-frame-based 3D human pose estimation by improving robustness in occluded scenarios.

Read Full Article

like

1 Like

source image

Arxiv

3d

read

0

img
dot

Image Credit: Arxiv

Large Language Models Are Better Logical Fallacy Reasoners with Counterargument, Explanation, and Goal-Aware Prompt Formulation

  • Large Language Models (LLMs) have improved our ability to process complex language, but detecting logical fallacies remains a challenge.
  • A study introduces a novel prompt formulation approach for logical fallacy detection.
  • The approach incorporates counterarguments, explanations, and goals to enrich the input text.
  • The method shows substantial improvements over previous models in both supervised and unsupervised settings.

Read Full Article

like

Like

source image

Arxiv

3d

read

148

img
dot

Image Credit: Arxiv

KernelDNA: Dynamic Kernel Sharing via Decoupled Naive Adapters

  • Dynamic convolution enhances model capacity by adaptively combining multiple kernels.
  • KernelDNA is a lightweight convolution kernel plug-in that enables dynamic kernel specialization without altering the standard convolution structure.
  • KernelDNA achieves state-of-the-art accuracy-efficiency balance among dynamic convolution variants.
  • Codes for KernelDNA are available at https://github.com/haiduo/KernelDNA.

Read Full Article

like

8 Likes

source image

Arxiv

3d

read

268

img
dot

Image Credit: Arxiv

COSMIC: Clique-Oriented Semantic Multi-space Integration for Robust CLIP Test-Time Adaptation

  • Researchers propose COSMIC (Clique-Oriented Semantic Multi-space Integration for CLIP), a test-time adaptation framework for vision-language models (VLMs).
  • COSMIC enhances adaptability through multi-granular, cross-modal semantic caching and graph-based querying mechanisms.
  • The framework introduces Dual Semantics Graph (DSG) to capture rich semantic relationships by incorporating textual features, coarse-grained CLIP features, and fine-grained DINOv2 features.
  • The Clique Guided Hyper-class component leverages structured class relationships to enhance prediction robustness in COSMIC.

Read Full Article

like

16 Likes

source image

Arxiv

3d

read

260

img
dot

Image Credit: Arxiv

DGSAM: Domain Generalization via Individual Sharpness-Aware Minimization

  • A new DG algorithm, Decreased-overhead Gradual Sharpness-Aware Minimization (DGSAM), has been introduced.
  • DGSAM aims to reduce sharpness consistently across domains while maintaining computational efficiency.
  • Experiments show that DGSAM outperforms state-of-the-art DG methods in terms of robustness and performance.
  • DGSAM reduces computational overhead compared to the popular approach of Sharpness-Aware Minimization (SAM).

Read Full Article

like

15 Likes

source image

Arxiv

3d

read

296

img
dot

Image Credit: Arxiv

Speculative End-Turn Detector for Efficient Speech Chatbot Assistant

  • Spoken dialogue systems struggle with end-turn detection (ETD) - the ability to distinguish between user turn completion and hesitation.
  • The ETD Dataset is introduced, which is the first public dataset for end-turn detection.
  • A collaborative inference framework, SpeculativeETD, is proposed to improve real-time ETD in resource-constrained environments.
  • SpeculativeETD significantly improves ETD accuracy while minimizing computation requirements.

Read Full Article

like

17 Likes

source image

Arxiv

3d

read

16

img
dot

Image Credit: Arxiv

Semantic-Preserving Transformations as Mutation Operators: A Study on Their Effectiveness in Defect Detection

  • Recent advances in defect detection use language models.
  • A study was conducted to determine the effectiveness of semantic-preserving transformations in improving defect detection tools.
  • 28 publications with 94 different transformations were analyzed.
  • Reusing shared semantic-preserving transformations proved to be challenging and did not improve the accuracy of defect detection models.

Read Full Article

like

Like

source image

Arxiv

3d

read

240

img
dot

Image Credit: Arxiv

Accelerated Stein Variational Gradient Flow

  • Accelerated Stein Variational Gradient Flow is a new method for sampling from a target distribution.
  • It is an accelerated version of the existing Stein variational gradient descent (SVGD) method.
  • ASVGD is designed to be fast and efficient in high-dimensional sampling.
  • Numerical examples show the effectiveness of ASVGD compared to SVGD and other sampling methods.

Read Full Article

like

14 Likes

source image

Arxiv

3d

read

257

img
dot

Image Credit: Arxiv

Codehacks: A Dataset of Adversarial Tests for Competitive Programming Problems Obtained from Codeforces

  • Codehacks is a dataset of programming problems obtained from the Codeforces online judge platform.
  • The dataset includes 288,617 error-inducing test cases referred to as 'hacks' for 5,578 programming problems.
  • Each problem in the dataset is accompanied by a natural language description and the source code for 2,196 submitted solutions.
  • The dataset aims to support data-driven creation of test suites, particularly for testing software synthesized from large language models.

Read Full Article

like

15 Likes

source image

Arxiv

3d

read

108

img
dot

Image Credit: Arxiv

Benchmarking Systematic Relational Reasoning with Large Language and Reasoning Models

  • Large Language Models (LLMs) struggle with systematic reasoning, even when performing well on certain tasks.
  • Post-training strategies based on reinforcement learning and chain-of-thought prompting have been seen as an improvement.
  • Little is known about the potential of Large Reasoning Models(LRMs) beyond mathematics and programming.
  • LLMs and LRMs still overall perform poorly, albeit better than random chance.

Read Full Article

like

6 Likes

source image

Arxiv

3d

read

208

img
dot

Image Credit: Arxiv

POINT$^{2}$: A Polymer Informatics Training and Testing Database

  • The integration of machine learning (ML) techniques has propelled the advancement of polymer informatics in predicting polymer properties and discovering high-performance materials.
  • However, the field lacks a standardized workflow that encompasses prediction accuracy, uncertainty quantification, ML interpretability, and polymer synthesizability.
  • To address these challenges, a comprehensive benchmark database and protocol called POINT$^{2}$ (POlymer INformatics Training and Testing) has been introduced.
  • The POINT$^{2}$ database provides a collection of ML models and polymer representations to achieve property predictions, uncertainty estimations, model interpretability, and template-based polymerization synthesizability.

Read Full Article

like

12 Likes

source image

Arxiv

3d

read

140

img
dot

Image Credit: Arxiv

Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model

  • DFI-OmniStereo is a novel omnidirectional stereo matching method that leverages a pre-trained depth foundation model for accurate depth estimation.
  • Omnidirectional depth perception is essential for mobile robotics applications, and camera-based setups provide a cost-effective solution.
  • Existing omnidirectional stereo matching approaches face limitations in accuracy due to the lack of real-world data.
  • DFI-OmniStereo achieves state-of-the-art results on the real-world Helvipad dataset, significantly reducing disparity mean absolute error (MAE) compared to previous methods.

Read Full Article

like

8 Likes

source image

Arxiv

3d

read

392

img
dot

Image Credit: Arxiv

Question-Aware Knowledge Graph Prompting for Enhancing Large Language Models

  • Large Language Models (LLMs) struggle with tasks requiring external knowledge.
  • Knowledge Graphs (KGs) can enhance reasoning, but existing methods demand costly fine-tuning or retrieve noisy KG information.
  • Question-Aware Knowledge Graph Prompting (QAP) dynamically assesses KG relevance and incorporates question embeddings into reasoning.
  • Experimental results demonstrate that QAP outperforms state-of-the-art methods in Multiple Choice Question Answering tasks.

Read Full Article

like

23 Likes

source image

Arxiv

3d

read

352

img
dot

Image Credit: Arxiv

Addressing Model Overcomplexity in Drug-Drug Interaction Prediction With Molecular Fingerprints

  • Accurately predicting drug-drug interactions (DDIs) is crucial for pharmaceutical research and clinical safety.
  • This study proposes a simpler approach using molecular representations like Morgan fingerprints, graph-based embeddings, and transformer-derived embeddings.
  • The combination of these representations achieves competitive performance in DDI prediction tasks.
  • The study highlights the importance of dataset curation and progressive complexity scaling in drug interaction prediction models.

Read Full Article

like

21 Likes

source image

Arxiv

3d

read

212

img
dot

Image Credit: Arxiv

DASH: Detection and Assessment of Systematic Hallucinations of VLMs

  • DASH (Detection and Assessment of Systematic Hallucinations) is an automatic, large-scale pipeline designed to identify systematic hallucinations of Vision-Language Models (VLMs) on real-world images in an open-world setting.
  • The pipeline utilizes DASH-OPT for image-based retrieval, optimizing over the 'natural image manifold' to generate images that mislead the VLM and expose its object hallucinations.
  • Applying DASH to PaliGemma and two LLaVA-NeXT models, it identifies more than 19k clusters with 950k images where the VLM hallucinates an object across 380 object classes.
  • The study also demonstrates that fine-tuning PaliGemma with the model-specific images obtained using DASH mitigates object hallucinations.

Read Full Article

like

12 Likes

For uninterrupted reading, download the app