menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

ML News

source image

Arxiv

2d

read

7

img
dot

Image Credit: Arxiv

Hierarchical Label Propagation: A Model-Size-Dependent Performance Booster for AudioSet Tagging

  • AudioSet is a large dataset used for audio tagging, containing 2 million samples labeled with 527 event categories.
  • Inconsistencies in the annotations of AudioSet lead to mislabeling of positive categories as negative.
  • Hierarchical Label Propagation (HLP) is applied to address this issue.
  • HLP increases the number of positive labels per audio clip and improves the performance of different model architectures.

Read Full Article

like

Like

source image

Arxiv

2d

read

19

img
dot

Image Credit: Arxiv

Learning from spatially inhomogenous data: resolution-adaptive convolutions for multiple sclerosis lesion segmentation

  • Researchers have developed a network architecture for segmenting multiple sclerosis lesions from spatially inhomogeneous MRI data without resampling.
  • The network is based on the e3nn framework and leverages a spherical harmonic parameterization of convolutional kernels, allowing it to be resampled to input voxel dimensions.
  • The network outperformed a standard U-Net when tested on both 2D and most 3D cases of multiple sclerosis lesions.
  • The approach demonstrates the ability to learn from various combinations of voxel sizes and generalize well to testing cases with different image resolutions.

Read Full Article

like

1 Like

source image

Arxiv

2d

read

107

img
dot

Image Credit: Arxiv

Shape Generation via Weight Space Learning

  • Foundation models for 3D shape generation can encode rich geometric priors across global and local dimensions.
  • Leveraging these priors for downstream tasks is challenging in real-world scenarios with scarce or noisy data.
  • Treating the weight space of a 3D shape-generative model as a data modality can be explored directly.
  • The high-dimensional weight space can modulate topological properties or fine-grained part features, enabling new approaches for 3D shape generation and specialized fine-tuning.

Read Full Article

like

6 Likes

source image

Arxiv

2d

read

142

img
dot

Image Credit: Arxiv

M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?

  • Researchers investigate whether Large Vision-Language Models (LVLMs) genuinely comprehend interleaved image-text in document summarization.
  • Existing document understanding benchmarks often assess LVLMs using question-answer formats, which may not guarantee coverage of long-range dependencies.
  • A novel and challenging Multimodal Document Summarization Benchmark (M-DocSum-Bench) is introduced, which includes high-quality arXiv papers with interleaved multimodal summaries aligned with human preferences.
  • Leading LVLMs struggle with coherence, accuracy of information integration, confusion between similar images, and lack of robustness in maintaining coherence and accuracy within long and interleaved contexts.

Read Full Article

like

8 Likes

source image

Arxiv

2d

read

234

img
dot

Image Credit: Arxiv

Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment

  • Inference-time computation provides an important axis for scaling language model performance.
  • Naively scaling compute through techniques like Best-of-$N$ sampling can cause performance degradation due to reward hacking.
  • Theoretical analysis of inference-time alignment algorithms reveals the importance of the pre-trained policy's coverage for performance and compute scaling.
  • The introduction of $ exttt{InferenceTimePessimism}$ algorithm mitigates reward hacking and exhibits optimal performance and scaling-monotonic characteristics.

Read Full Article

like

14 Likes

source image

Arxiv

2d

read

361

img
dot

Image Credit: Arxiv

StarFlow: Generating Structured Workflow Outputs From Sketch Images

  • Workflows are fundamental for automation in enterprise platforms, but building them can be complex.
  • A new framework called StarFlow uses vision-language models to automatically generate structured workflows from visual inputs.
  • To address challenges, a diverse dataset of workflow diagrams was curated for training and evaluation.
  • The results demonstrate that finetuning enhances structured workflow generation, outperforming large vision-language models.

Read Full Article

like

21 Likes

source image

Arxiv

2d

read

63

img
dot

Image Credit: Arxiv

Exponentially Weighted Instance-Aware Repeat Factor Sampling for Long-Tailed Object Detection Model Training in Unmanned Aerial Vehicles Surveillance Scenarios

  • Object detection models often struggle with class imbalance, where rare categories appear significantly less frequently than common ones.
  • Existing rebalancing strategies for class imbalance, such as Repeat Factor Sampling (RFS) and Instance-Aware Repeat Factor Sampling (IRFS), have limitations in long-tailed distributions.
  • This work introduces Exponentially Weighted Instance-Aware Repeat Factor Sampling (E-IRFS), an extension of IRFS that applies exponential scaling to better rebalance rare and frequent classes in object detection.
  • E-IRFS improves detection performance by 22% over the baseline, outperforming RFS and IRFS, especially for rare categories, in resource-constrained environments like UAV-based emergency monitoring.

Read Full Article

like

3 Likes

source image

Arxiv

2d

read

357

img
dot

Image Credit: Arxiv

Multimodal Data Integration for Sustainable Indoor Gardening: Tracking Anyplant with Time Series Foundation Model

  • Indoor gardening is growing in popularity for urban food security and sustainability.
  • Advancements in IoT technologies and sustainable innovations are driving the growth of urban farming.
  • A novel framework integrating computer vision, machine learning, and environmental sensing is proposed for automated monitoring of plant health and growth.
  • The framework combines RGB imagery, plant phenotyping data, and environmental factors to predict plant water stress in a controlled growth environment.

Read Full Article

like

21 Likes

source image

Arxiv

2d

read

376

img
dot

Image Credit: Arxiv

Lobster: A GPU-Accelerated Framework for Neurosymbolic Programming

  • Lobster is a GPU-accelerated framework for neurosymbolic programming that combines deep learning with symbolic reasoning.
  • Lobster maps a neurosymbolic language based on Datalog to the GPU programming paradigm via compilation to an intermediate language called APM.
  • This enables Lobster to be both flexible, supporting different modes of reasoning, and performant, implementing new optimization passes.
  • Lobster achieves an average speedup of 5.3x over Scallop, a state-of-the-art neurosymbolic framework, and enables scaling of neurosymbolic solutions to previously infeasible tasks.

Read Full Article

like

22 Likes

source image

Arxiv

2d

read

186

img
dot

Image Credit: Arxiv

Differential Evolution for Grassmann Manifold Optimization: A Projection Approach

  • Researchers propose a novel evolutionary algorithm for optimizing real-valued objective functions on the Grassmann manifold.
  • Existing optimization techniques on the Grassmann manifold primarily rely on first- or second-order Riemannian methods, which struggle with nonconvex or multimodal landscapes.
  • The proposed algorithm adapts the Differential Evolution algorithm, a global, population-based optimization method, for effective operation on the Grassmann manifold.
  • The algorithm incorporates adaptive control parameter schemes and introduces a projection mechanism to maintain feasibility with manifold structure and explore beyond local neighborhoods.

Read Full Article

like

11 Likes

source image

Arxiv

2d

read

206

img
dot

Image Credit: Arxiv

Bresa: Bio-inspired Reflexive Safe Reinforcement Learning for Contact-Rich Robotic Tasks

  • Ensuring safety in reinforcement learning (RL)-based robotic systems is a critical challenge, especially in contact-rich tasks within unstructured environments.
  • Current safe RL approaches focus on high-level recovery mechanisms, neglecting low-level execution safety.
  • The proposed method, Bresa, decouples task learning from safety learning, incorporating a safety critic network that operates at a higher frequency for real-time intervention in unsafe conditions.
  • Bresa outperforms the baseline in multiple tasks and provides a reflexive safety mechanism that bridges the gap between planning and execution.

Read Full Article

like

12 Likes

source image

Arxiv

2d

read

329

img
dot

Image Credit: Arxiv

FACETS: Efficient Once-for-all Object Detection via Constrained Iterative Search

  • Neural Architecture Search (NAS) for deep learning object detection frameworks is computationally expensive due to the vast search space.
  • The proposed method, FACETS, is a unified iterative NAS technique that refines the architecture of all modules cyclically.
  • FACETS reduces the search space, preserves interdependencies among modules, and incorporates constraints based on the target device's computational budget.
  • FACETS achieves higher accuracy and faster search compared to progressive and single-module search strategies.

Read Full Article

like

19 Likes

source image

Arxiv

2d

read

170

img
dot

Image Credit: Arxiv

Enhancing Domain-Specific Encoder Models with LLM-Generated Data: How to Leverage Ontologies, and How to Do Without Them

  • Researchers investigate using LLM-generated data for continual pretraining of encoder models in specialized domains with limited training data.
  • They leverage domain-specific ontologies to enrich them with LLM-generated data, pretraining the encoder model as an ontology-informed embedding model for concept definitions.
  • The proposed approach proves effective in the scientific domain of invasion biology, achieving substantial improvements over standard LLM pretraining.
  • The study also explores the feasibility of applying this approach to domains without comprehensive ontologies, substituting ontological concepts with concepts extracted from scientific abstracts and establishing relationships between them using distributional statistics.

Read Full Article

like

10 Likes

source image

Arxiv

2d

read

178

img
dot

Image Credit: Arxiv

Tune It Up: Music Genre Transfer and Prediction

  • Researchers adapt and improve CycleGAN model to perform music style transfer on Jazz and Classic genres.
  • The goal is to easily generate new songs, cover music to different genres, and reduce the arrangements needed in those processes.
  • A music genre classifier is used to assess the performance of the transfer models, achieving an accuracy of 87.7%.
  • The developed genre classifier obtains the best accuracies of 69.4% in Jazz to Classic task, and 39.3% in Classic to Jazz task.

Read Full Article

like

10 Likes

source image

Arxiv

2d

read

392

img
dot

Image Credit: Arxiv

CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models

  • CoT-VLA is a method that incorporates explicit visual chain-of-thought reasoning into vision-language-action models.
  • It predicts future image frames autoregressively as visual goals and generates a short action sequence to achieve these goals.
  • CoT-VLA outperforms the state-of-the-art VLA model by 17% in real-world manipulation tasks and 6% in simulation benchmarks.
  • CoT-VLA is a state-of-the-art 7B VLA that can understand and generate visual and action tokens.

Read Full Article

like

23 Likes

For uninterrupted reading, download the app