menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

ML News

source image

Arxiv

2d

read

90

img
dot

Image Credit: Arxiv

Scaling Laws for Uncertainty in Deep Learning

  • Deep learning has revealed scaling laws that predict model performance based on dataset and model sizes.
  • Researchers are exploring whether similar scaling laws govern predictive uncertainties in deep learning.
  • In identifiable parametric models, scaling laws for uncertainty can be derived by treating model parameters in a Bayesian way.
  • Guarantees on uncertainty contraction rates do not hold in over-parameterized models.
  • Empirical evidence shows scaling laws for predictive uncertainty with respect to dataset and model sizes.
  • Experiments on vision and language tasks confirm scaling laws for predictive uncertainty using Bayesian inference and ensemble methods.
  • This research challenges skepticism towards Bayesian approaches in deep learning.
  • Having a large amount of data is not sufficient to eliminate epistemic uncertainty.

Read Full Article

like

5 Likes

source image

Arxiv

2d

read

0

img
dot

Image Credit: Arxiv

HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios

  • Action segmentation is a core challenge in high-level video understanding, focusing on partitioning videos and assigning predefined action labels.
  • Existing methods mainly address single-person activities, leaving out multi-person scenarios.
  • A new dataset, RHAS133, is introduced for Referring Human Action Segmentation in multi-person settings, comprising 133 movies with annotations for 137 actions and textual descriptions.
  • Benchmarking existing methods on the RHAS133 dataset shows limited performance in aggregating visual cues for target individuals.
  • To improve action segmentation in multi-person scenarios, a new framework called HopaDIFF is proposed.
  • HopaDIFF leverages a holistic-partial aware Fourier-conditioned diffusion approach and a novel cross-input gate attentional xLSTM for enhanced long-range reasoning.
  • The framework introduces a Fourier condition to gain more control and improve action segmentation generation.
  • HopaDIFF achieves state-of-the-art results on the RHAS133 dataset across various evaluation scenarios.
  • The code for HopaDIFF is available at https://github.com/KPeng9510/HopaDIFF.git.

Read Full Article

like

Like

source image

Arxiv

2d

read

375

img
dot

Image Credit: Arxiv

DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy

  • DipLLM is a fine-tuned Large Language Model (LLM) designed for strategic decision-making in Diplomacy, a complex multiplayer game that combines cooperation and competition.
  • Traditional methods for AI in Diplomacy rely on equilibrium search, requiring extensive game data and computational resources.
  • LLMs offer an alternative by leveraging pre-trained knowledge for strong performance with limited fine-tuning.
  • However, applying LLMs to Diplomacy is challenging due to the game's complexity and strategic interactions among players.
  • DipLLM simplifies the task by using an autoregressive factorization framework to break down multi-unit action assignment into unit-level decisions.
  • The model fine-tunes by learning equilibrium policies and outperforms the Cicero model with only 1.5% of the training data.
  • This research demonstrates the potential of fine-tuned LLMs for complex strategic decision-making in multiplayer games.

Read Full Article

like

22 Likes

source image

Arxiv

2d

read

359

img
dot

Image Credit: Arxiv

Intent Factored Generation: Unleashing the Diversity in Your Language Model

  • Obtaining multiple diverse samples from Large Language Models for a prompt is a challenge.
  • Current methods focus on token-level diversity, leading to repetitive responses.
  • Intent Factored Generation (IFG) proposed to address diversity and engagement issues.
  • IFG involves sampling a semantic intent first and then generating a response based on the intent and prompt.
  • Higher temperature used for intent step to promote diversity, lower temperature for final generation for coherence.
  • Prompting the model to state its intent before generating enhances reasoning tasks.
  • IFG shows effectiveness in improving pass@k and Reinforcement Learning on math and code tasks.
  • IFG combined with Direct Preference Optimization enhances conversational diversity without loss in reward.
  • IFG maintains diversity and quality in general language modeling using reader comments and news articles dataset.
  • IFG is a simple method to boost diversity in Large Language Models while preserving performance.
  • The method is easy to integrate into various algorithms for improved performance across applications.

Read Full Article

like

21 Likes

source image

Arxiv

2d

read

240

img
dot

Image Credit: Arxiv

CINeMA: Conditional Implicit Neural Multi-Modal Atlas for a Spatio-Temporal Representation of the Perinatal Brain

  • Researchers have introduced CINeMA, a novel framework for creating high-resolution, spatio-temporal, multimodal brain atlases suitable for low-data settings.
  • CINeMA operates in latent space, avoiding the need for compute-intensive image registration and reducing atlas construction times from days to minutes.
  • The framework allows flexible conditioning on anatomical features like gestational age, birth age, and brain pathologies such as ventriculomegaly and agenesis of the corpus callosum.
  • CINeMA supports tasks like tissue segmentation, age prediction, synthetic data creation, and anatomically informed data augmentation.
  • The framework surpasses existing methods in accuracy, efficiency, and versatility, making it a valuable tool for advancing brain research.
  • The code and atlases for CINeMA are available at https://github.com/m-dannecker/CINeMA.

Read Full Article

like

14 Likes

source image

Arxiv

2d

read

296

img
dot

Image Credit: Arxiv

Adding simple structure at inference improves Vision-Language Compositionality

  • Dual encoder Vision-Language Models (VLM) like CLIP face challenges with compositionality affecting their retrieval performance.
  • Various training methods have been suggested to enhance the vision-language compositionality of these models.
  • This study focuses on adding simple structure during inference to address the compositionality issue.
  • The proposed method involves dividing images into smaller crops, extracting text segments describing objects, attributes, and relations, and aligning image crops with text segments using a VLM.
  • The final image-text similarity is calculated by aggregating individual similarities of matched image crops and text segments.
  • The approach is evaluated on popular dual encoder VLMs across controlled and natural datasets for vision-language compositionality, showing consistent performance improvements without additional training.
  • Significant enhancements are observed in attribute-object binding, particularly in the controlled dataset.
  • Analysis reveals the importance of processing image crops for performance gains and highlights areas for further improvement in inference-time techniques.

Read Full Article

like

17 Likes

source image

Arxiv

2d

read

3

img
dot

Image Credit: Arxiv

Empirical and computer-aided robustness analysis of long-step and accelerated methods in smooth convex optimization

  • This work examines the robustness of different first-order optimization methods in dealing with relative inexactness in gradient computations.
  • Three major families of methods analyzed are constant step gradient descent, long-step methods, and accelerated methods.
  • Long-step and accelerated methods are theoretically shown to be not robust to inexactness initially.
  • A semi-heuristic shortening factor is introduced to enhance the theoretical guarantees of long-step and accelerated methods.
  • All methods are tested on an inexact problem, showing that accelerated methods are more robust than expected and the shortening factor helps long-step methods significantly.
  • The study concludes that all shortened methods appear promising, even in an inexact setting.

Read Full Article

like

Like

source image

Arxiv

2d

read

15

img
dot

Image Credit: Arxiv

Alice and the Caterpillar: A more descriptive null model for assessing data mining results

  • Researchers introduce novel null models for assessing data mining results using statistical hypothesis testing.
  • These null models preserve more properties of observed binary transactional and sequence datasets compared to existing models.
  • The new models maintain the Bipartite Joint Degree Matrix of the dataset's corresponding bipartite (multi-)graph.
  • Preserving properties like the number of caterpillars (paths of length three) is a focus of the new null models.
  • The researchers developed a suite named Alice, leveraging Markov chain Monte Carlo algorithms for sampling datasets from the null models.
  • Alice is based on a well-defined set of states and efficient operations for transitioning between them.
  • Experimental results demonstrate that Alice mixes quickly, scales effectively, and uncovers different significant results compared to existing models.

Read Full Article

like

Like

source image

Arxiv

2d

read

11

img
dot

Image Credit: Arxiv

Learning to Optimize Package Picking for Large-Scale, Real-World Robot Induction

  • Warehouse automation using machine learning models can enhance operational efficiency and reduce costs in large-scale robotic fleets.
  • Current research focuses on increasing picking success rates by prioritizing high-probability picks but lacks data-driven optimization for performance at scale.
  • A new ML-based framework was developed to predict transform adjustments and optimize suction cup selection for multi-suction end effectors in packages.
  • The framework was tested in workcells resembling Amazon Robotics' Robot Induction fleet, leading to a 20% decrease in pick failure rates compared to heuristic methods.

Read Full Article

like

Like

source image

Arxiv

2d

read

110

img
dot

Image Credit: Arxiv

Automatic Treatment Planning using Reinforcement Learning for High-dose-rate Prostate Brachytherapy

  • Researchers investigated the use of reinforcement learning (RL) for high-dose-rate (HDR) prostate brachytherapy to optimize needle placement based on patient anatomy.
  • The RL agent adjusts needle positions and dwell times to maximize a reward function, with multiple rounds played until all needles are optimized.
  • Data from 11 patients were included, showing RL plans had similar prostate coverage and rectum dose compared to clinical plans, but lower prostate hotspot and urethra dose.
  • RL plans used, on average, two fewer needles than clinical plans, demonstrating potential for improved efficiency and plan quality.
  • This study showcases the feasibility of RL in autonomously generating practical HDR prostate brachytherapy plans, offering standardized planning and enhanced patient outcomes.

Read Full Article

like

6 Likes

source image

Arxiv

2d

read

280

img
dot

Image Credit: Arxiv

CoRT: Code-integrated Reasoning within Thinking

  • Large Reasoning Models (LRMs) like o1 and DeepSeek-R1 have made progress in natural language reasoning with long chain-of-thought (CoT) but struggle with complex mathematical operations.
  • Code Interpreter (CI) introduces external knowledge to LRMs, but combining it directly poses challenges.
  • CoRT is a post-training framework designed to teach LRMs to effectively use CI for complex mathematical operations.
  • Data scarcity is addressed by synthesizing code-integrated reasoning data through Hint-Engineering, strategically inserting hints to optimize LRM-CI interaction.
  • 30 high-quality samples are manually created to post-train models ranging from 1.5B to 32B parameters using supervised fine-tuning, rejection fine-tuning, and reinforcement learning.
  • Hint-Engineering models show 4% and 8% absolute improvements on DeepSeek-R1-Distill-Qwen-32B and DeepSeek-R1-Distill-Qwen-1.5B respectively across five challenging mathematical reasoning datasets.
  • Hint-Engineering models use about 30% fewer tokens for the 32B model and 50% fewer tokens for the 1.5B model compared to natural language models.
  • Experimental results demonstrate the effectiveness of CoRT in improving LRMs' performance on mathematical reasoning tasks.
  • The models and code for CoRT are available at https://github.com/ChengpengLi1003/CoRT.

Read Full Article

like

16 Likes

source image

Arxiv

2d

read

150

img
dot

Image Credit: Arxiv

A Deep Generative Model for the Simulation of Discrete Karst Networks

  • The simulation of discrete karst networks is challenging due to complex physicochemical processes in geological and hydrogeological contexts.
  • A novel approach using graph generative models to represent karst networks has been proposed.
  • Karst networks are represented as graphs with nodes containing spatial information and edges indicating connections between nodes.
  • The generative process involves utilizing graph recurrent neural networks (GraphRNN) to learn the topological distribution of karst networks.
  • Denoising diffusion probabilistic models on graphs (G-DDPM) are used to learn node features like spatial coordinates.
  • The approach aims to generate realistic karst networks that capture essential features of the original data.
  • Real-world karst networks were used to test the approach by comparing generated subgraphs with actual subgraphs.
  • Geometry and topology metrics were employed to evaluate the generated subgraphs.
  • The methodology allows for stochastic simulation of discrete karst networks across various formations.
  • It serves as a useful tool for studying physical processes such as flow and transport in karst environments.

Read Full Article

like

9 Likes

source image

Arxiv

2d

read

304

img
dot

Image Credit: Arxiv

Advancing Exchange Rate Forecasting: Leveraging Machine Learning and AI for Enhanced Accuracy in Global Financial Markets

  • Foreign exchange rate forecasting, like USD to BDT, is critical in global financial markets impacting trade and economic stability.
  • This study uses historical USD/BDT data from 2018-2023 to develop machine learning models for accurate forecasting.
  • A Long Short-Term Memory (LSTM) neural network achieves 99.449% accuracy, with an RMSE of 0.9858, outperforming ARIMA.
  • A Gradient Boosting Classifier (GBC) is also employed for directional prediction, revealing a 40.82% profitable trade rate.
  • Historical trends analysis shows a decline in BDT/USD rates and incorporates normalized daily returns for volatility.
  • Deep learning in forex forecasting offers traders and policymakers robust tools to mitigate risks in financial markets.
  • Future work may involve integrating sentiment analysis and real-time economic indicators for enhanced model adaptability.

Read Full Article

like

18 Likes

source image

Arxiv

2d

read

276

img
dot

Image Credit: Arxiv

PersonaLens: A Benchmark for Personalization Evaluation in Conversational AI Assistants

  • Large language models have improved conversational AI assistants, but evaluating personalization in these assistants is challenging.
  • Existing personalization benchmarks do not capture the complexities of personalized task-oriented assistance.
  • To address this gap, PersonaLens is introduced, a benchmark for evaluating personalization in task-oriented AI assistants.
  • PersonaLens includes diverse user profiles with rich preferences and interaction histories, along with specialized LLM-based user and judge agents.
  • The user agent engages in realistic task-oriented dialogues with AI assistants, while the judge agent assesses personalization, response quality, and task success.
  • Extensive experiments with current LLM assistants across diverse tasks have shown significant variability in personalization capabilities.
  • PersonaLens provides crucial insights for the advancement of conversational AI systems.

Read Full Article

like

16 Likes

source image

Arxiv

2d

read

35

img
dot

Image Credit: Arxiv

Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy

  • Kvasir-VQA-x1 is a new multimodal dataset designed for medical reasoning and robust MedVQA in gastrointestinal endoscopy.
  • The dataset addresses the limitations of current datasets by incorporating 159,549 new question-answer pairs to test deeper clinical reasoning.
  • Questions in the dataset are stratified by complexity to evaluate a model's inference capabilities more effectively.
  • To prepare models for real-world clinical scenarios, visual augmentations that simulate common imaging artifacts have been included in the dataset.
  • Kvasir-VQA-x1 supports two evaluation tracks: one for standard VQA performance and the other to assess model robustness against visual perturbations.
  • The dataset aims to accelerate the development of more reliable and effective AI systems for clinical use by providing a challenging and clinically relevant benchmark.
  • Kvasir-VQA-x1 adheres to FAIR data principles, ensuring accessibility and transparency for the wider research community.
  • Code and data related to the dataset can be found on GitHub at https://github.com/Simula/Kvasir-VQA-x1
  • Access to the dataset is available at https://huggingface.co/datasets/SimulaMet/Kvasir-VQA-x1

Read Full Article

like

2 Likes

For uninterrupted reading, download the app