menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

ML News

source image

Arxiv

1d

read

345

img
dot

Image Credit: Arxiv

VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment

  • VELOCITI is a benchmark created to study Video-LLMs and assess compositional reasoning in short videos.
  • It disentangles and evaluates the comprehension of agents, actions, and their associations across multiple events.
  • Current video models like LLaVA-OneVision and Gemini-1.5-Pro perform far from human accuracy in classifying positive and negative captions.
  • The benchmark highlights challenges with ClassicVLE and multiple-choice evaluation, emphasizing the preference for StrictVLE.

Read Full Article

like

20 Likes

source image

Arxiv

1d

read

108

img
dot

Image Credit: Arxiv

Coupled Input-Output Dimension Reduction: Application to Goal-oriented Bayesian Experimental Design and Global Sensitivity Analysis

  • A new method for joint dimension reduction of input and output spaces of a function is introduced.
  • Conventional methods focus on reducing either the input or output space, while this coupled approach supports simultaneous reduction of both.
  • The method is suitable for goal-oriented dimension reduction, where input or output quantities of interest are prescribed.
  • Applications include goal-oriented sensor placement and goal-oriented sensitivity analysis, solving combinatorial optimization problems by optimizing gradient-based bounds.

Read Full Article

like

6 Likes

source image

Arxiv

1d

read

306

img
dot

Image Credit: Arxiv

Cascade Reward Sampling for Efficient Decoding-Time Alignment

  • Cascade Reward Sampling (CARDS) is introduced to address efficiency bottlenecks in decoding-time alignment of large language models (LLMs).
  • CARDS utilizes a segment-level rejection sampling algorithm to minimize redundant computations of LLMs and reward models (RMs).
  • An uncertainty-based segmentation mechanism ensures accurate evaluation of RMs on incomplete segments.
  • Experimental results demonstrate that CARDS significantly improves decoding efficiency, alignment quality, and general utility.

Read Full Article

like

18 Likes

source image

Arxiv

1d

read

314

img
dot

Image Credit: Arxiv

ShapG: new feature importance method based on the Shapley value

  • A new Explainable Artificial Intelligence (XAI) method called ShapG (Explanations based on Shapley value for Graphs) has been developed for measuring feature importance.
  • ShapG is a model-agnostic global explanation method that defines an undirected graph based on the dataset and calculates feature importance using an approximated Shapley value.
  • Comparisons with existing XAI methods demonstrate that ShapG provides more accurate explanations and exhibits advantages in terms of computational efficiency.
  • The ShapG method has wide applicability and can improve the explainability and transparency of AI systems in various fields.

Read Full Article

like

18 Likes

source image

Arxiv

1d

read

151

img
dot

Image Credit: Arxiv

PQCache: Product Quantization-based KVCache for Long Context LLM Inference

  • A new method called PQCache is proposed to address the memory bottleneck in Large Language Models (LLMs) inference.
  • PQCache employs Product Quantization (PQ) to manage the Key-Value Cache (KVCache) in LLMs, maintaining model quality while ensuring low serving latency.
  • PQCache applies PQ to tokens' keys during the prefilling phase and uses PQ codes and centroids to fetch key-value pairs during the autoregressive decoding phase.
  • Extensive experiments show that PQCache achieves improved model effectiveness and efficiency, with a 4.60% score improvement over existing methods.

Read Full Article

like

9 Likes

source image

Arxiv

1d

read

368

img
dot

Image Credit: Arxiv

SoftCVI: Contrastive variational inference with self-generated soft labels

  • Soft Contrastive Variational Inference (SoftCVI) is introduced, allowing a family of variational objectives to be derived through a contrastive estimation framework.
  • SoftCVI reframes the inference task as a contrastive estimation problem and does not require positive or negative samples.
  • SoftCVI learns by sampling the variational distribution and computing ground truth soft classification labels from the unnormalized posterior itself.
  • Empirical investigation shows that SoftCVI can form stable and effective objectives for Bayesian inference tasks, frequently outperforming other variational approaches.

Read Full Article

like

22 Likes

source image

Arxiv

1d

read

120

img
dot

Image Credit: Arxiv

Diffusion-based subsurface CO$_2$ multiphysics monitoring and forecasting

  • Carbon capture and storage (CCS) is important for mitigating greenhouse gas emissions from industrial outputs.
  • A novel subsurface multiphysics monitoring and forecasting framework using video diffusion models is proposed.
  • The proposed method successfully captures complex physical phenomena related to CO2 monitoring.
  • It can predict and invert subsurface elastic properties and CO2 saturation with consistency.

Read Full Article

like

7 Likes

source image

Arxiv

1d

read

341

img
dot

Image Credit: Arxiv

LLM Stability: A detailed analysis with some surprises

  • LLM (large language model) practitioners commonly notice that outputs can vary for the same inputs under settings expected to be deterministic.
  • A systematic investigation into the non-determinism of five LLMs configured to be deterministic was performed.
  • Accuracy variations of up to 15% were observed across naturally occurring runs, with a gap of best possible performance to worst possible performance of up to 70%.
  • Non-determinism in LLMs is considered essential to the efficient use of compute resources, indicating that this issue will persist.

Read Full Article

like

20 Likes

source image

Arxiv

1d

read

326

img
dot

Image Credit: Arxiv

AI in radiological imaging of soft-tissue and bone tumours: a systematic review evaluating against CLAIM and FUTURE-AI guidelines

  • This systematic review evaluates Artificial Intelligence (AI) methods in radiological imaging for the diagnosis and prognosis of soft-tissue and bone tumours.
  • The review highlights challenges in clinical translation and assesses the alignment of studies with the CLAIM and FUTURE-AI guidelines.
  • Out of 325 evaluated articles, most studies performed moderately on CLAIM but poorly on FUTURE-AI.
  • The review suggests that AI developers should focus on design, development, evaluation, and data reproducibility to improve the clinical translation of AI methods.

Read Full Article

like

19 Likes

source image

Arxiv

1d

read

205

img
dot

Image Credit: Arxiv

Learning out-of-time-ordered correlators with classical kernel methods

  • Out-of-Time Ordered Correlators (OTOCs) are commonly used to study information scrambling in quantum systems.
  • Directly computing OTOCs with classical computers is computationally expensive.
  • A study explores the use of classical kernel methods (KMs) to accurately learn OTOCs and related quantities of local one-dimensional quantum systems.
  • The proposed method can assist in evaluating OTOC functions of the parameterized quantum systems.

Read Full Article

like

12 Likes

source image

Arxiv

1d

read

217

img
dot

Image Credit: Arxiv

Says Who? Effective Zero-Shot Annotation of Focalization

  • Researchers have tested the annotation of focalization in literature using large language models (LLMs).
  • The study found that LLMs performed comparable to trained human annotators, achieving an average F1 score of 84.79%.
  • The log probabilities output by GPT-family models reflected the difficulty of annotating specific literary excerpts.
  • The research highlights the potential of LLMs for computational literary studies and insights into focalization in literature.

Read Full Article

like

13 Likes

source image

Arxiv

1d

read

31

img
dot

Image Credit: Arxiv

Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models

  • Researchers propose the use of latent space generative world models to address the covariate shift problem in autonomous driving.
  • The driving policy can effectively mitigate covariate shift without requiring an excessive amount of training data by leveraging a world model during training.
  • The policy learns how to recover from errors by aligning with states observed in human demonstrations during end-to-end training.
  • Qualitative and quantitative results demonstrate significant improvements upon prior state of the art in closed-loop testing in the CARLA simulator.

Read Full Article

like

1 Like

source image

Arxiv

1d

read

357

img
dot

Image Credit: Arxiv

DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications for Multi-Task RL

  • A new learning approach has been proposed to efficiently satisfy complex Linear Temporal Logic (LTL) specifications in multi-task reinforcement learning (RL).
  • Existing approaches for satisfying LTL specifications suffer from various limitations, such as only being applicable to finite-horizon fragments of LTL, suboptimal solutions, and insufficient handling of safety constraints.
  • The proposed method uses B"uchi automata to represent the semantics of LTL specifications and learns policies based on sequences of truth assignments.
  • Experiments show that the approach can zero-shot satisfy a wide range of specifications, both finite- and infinite-horizon, and outperforms existing methods in terms of satisfaction probability and efficiency.

Read Full Article

like

21 Likes

source image

Arxiv

1d

read

275

img
dot

Image Credit: Arxiv

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

  • ScienceAgentBench is a benchmark for evaluating language agents for data-driven scientific discovery.
  • It aims to assess the capabilities of large language models (LLMs) in automating scientific discovery tasks.
  • The benchmark includes 102 tasks extracted from peer-reviewed publications in four disciplines, with validation from subject matter experts.
  • Results show that current language agents have limitations in generating code for data-driven discovery and end-to-end automation of scientific research.

Read Full Article

like

16 Likes

source image

Arxiv

1d

read

201

img
dot

Image Credit: Arxiv

MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models

  • Researchers introduce MMIE, a large-scale benchmark for evaluating multimodal comprehension and generation in Large Vision-Language Models (LVLMs).
  • MMIE consists of 20K curated multimodal queries covering various categories and subfields.
  • The benchmark supports interleaved inputs and outputs, evaluating competencies through multiple-choice and open-ended questions.
  • An automated evaluation metric with reduced bias and improved accuracy is proposed.

Read Full Article

like

12 Likes

For uninterrupted reading, download the app