Machine Learning (ML) Latest News and Trending articles from all top sources only on Techminis

A naukri.com initiative

New

Home

ML News

Arxiv

173

Image Credit: Arxiv

SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths

Speculative decoding aims to reduce inference latency of large language models by using a faster draft model.
Previous methods for choosing the candidate length parameter in speculative decoding may not be optimal.
Researchers propose SpecDec++, an improved speculative decoding approach that dynamically determines candidate lengths based on acceptance probabilities.
SpecDec++ shows significant speedup and performance improvements on various datasets compared to traditional speculative decoding methods.

Read Full Article

10 Likes

Arxiv

334

Image Credit: Arxiv

Local Flow Matching Generative Models

Flow Matching (FM) is a simulation-free method for learning a continuous and invertible flow to interpolate between two distributions.
Local Flow Matching (LFM) is introduced as a stepwise FM model that learns a sequence of FM sub-models, each matching a diffusion process.
LFM enables the use of smaller models with faster training by consecutively matching distributions closer to each other.
Empirical demonstrations show improved training efficiency and competitive generative performance of LFM compared to FM on various datasets.

Read Full Article

20 Likes

Arxiv

Image Credit: Arxiv

Reconstructing Galaxy Cluster Mass Maps using Score-based Generative Modeling

A new approach to reconstruct gas and dark matter maps of galaxy clusters using score-based generative modeling has been developed.
The model utilizes mock SZ and X-ray images as inputs and generates realizations of gas and dark matter maps based on a learned data posterior.
Experiments show the model accurately reconstructs radial density profiles in the spatial domain and demonstrates the ability to distinguish between clusters of different mass sizes.
The diffusion model can be fine-tuned to incorporate additional observables and predict unknown density distributions of galaxy clusters based on real observations.

Read Full Article

Arxiv

288

Image Credit: Arxiv

Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces

A new study introduces Dualformer, a Transformer model that can operate in two reasoning modes, fast and slow, by training on randomized reasoning traces.
Dualformer outperforms baselines in terms of performance and computational efficiency across all modes, achieving a high optimal rate on maze tasks and producing more diverse reasoning traces.
The model can be configured to execute in either fast or slow mode, or automatically decide which mode to engage at inference time.
Dualformer's capabilities extend beyond task-specific models, showing improved performance in math reasoning problems through Large Language Models fine-tuning.

Read Full Article

17 Likes

Discover more

Arxiv

Image Credit: Arxiv

Local transfer learning Gaussian process modeling, with applications to surrogate modeling of expensive computer simulators

Surrogate models are essential for emulating and quantifying uncertainty on expensive computer simulators for complex systems.
A new LOL-GP model is proposed, focusing on local transfer learning Gaussian Process for effective surrogate training on a target system using information from related source systems.
The LOL-GP incorporates a latent regularization model that identifies regions for beneficial transfer and areas where transfer should be avoided to mitigate the risk of negative transfer.
Numerical experiments and an application for jet turbine design demonstrate the improved surrogate performance of the LOL-GP over existing methods.

Read Full Article

3 Likes

Arxiv

165

Image Credit: Arxiv

State Estimation Using Sparse DEIM and Recurrent Neural Networks

Sparse Discrete Empirical Interpolation Method (S-DEIM) is used for state estimation in dynamical systems with sparse observations.
An equation-free S-DEIM framework is introduced that utilizes recurrent neural networks (RNNs) to estimate the optimal kernel vector from sparse observational time-series data.
RNNs incorporate past observations to improve estimations as the optimal kernel vector cannot be estimated from instantaneous data.
The method's efficacy is demonstrated on atmospheric flow, Kuramoto-Sivashinsky equation, and Rayleigh-Benard convection, showing satisfactory results with a simple RNN architecture.

Read Full Article

9 Likes

Arxiv

342

Image Credit: Arxiv

On the Gaussian process limit of Bayesian Additive Regression Trees

Bayesian Additive Regression Trees (BART) is a nonparametric Bayesian regression technique that becomes equivalent to Gaussian process (GP) regression in the limit of infinite trees.
The exact BART prior covariance function has been derived and computed for the first time in this study, allowing implementation of the infinite trees limit of BART as GP regression.
Empirical tests show that the GP regression obtained from BART's infinite trees limit, when tuned appropriately, can be competitive with standard BART after tuning hyperparameters.
Using a GP surrogate of BART simplifies model building and avoids the complex BART MCMC algorithm, offering new insights into the development of both BART and GP regression.

Read Full Article

20 Likes

Arxiv

Image Credit: Arxiv

Conditional regression for the Nonlinear Single-Variable Model

Regressing a function on multidimensional space without the issues of statistical and computational complexity requires specific statistical models.
Compositional models involving a composition of functions, particularly when one function is nonlinear, are less explored in the context of avoiding the curse of dimensionality.
A proposed nonlinear model involves projecting data onto a regular curve parameter and applying a function, serving as a nonlinear extension of the single-index model.
A nonparametric conditional regression estimator for this model, under certain assumptions like coarse monotonicity, achieves close to optimal rates for non-parametric regression and can be implemented efficiently.

Read Full Article

1 Like

Arxiv

Image Credit: Arxiv

FonTS: Text Rendering with Typography and Style Controls

A new paper on arXiv proposes a two-stage diffusion transformer-based pipeline to improve text rendering controllability over typography and style.
They introduce typography control fine-tuning (TC-FT) and a text-agnostic style control adapter (SCA) to address font inconsistency and style variation challenges.
The proposed approach focuses on precise word-level application of typographic features and enhanced style consistency in text rendering tasks.
The paper incorporates HTML-render into the data synthesis pipeline and provides a word-level controllable dataset for academic use.

Read Full Article

1 Like

Arxiv

346

Image Credit: Arxiv

What should a neuron aim for? Designing local objective functions based on information theory

Researchers explore achieving self-organization between artificial neurons by designing local learning goals based on information theory.
The study uses Partial Information Decomposition to parameterize abstract bio-inspired local learning goals, allowing neurons to shape information integration.
The framework enables neurons to select how inputs from feedforward, feedback, and lateral classes contribute uniquely, redundantly, or synergistically to the output.
This approach offers neuron-level interpretability and strong performance through local learning strategies rooted in information theory.

Read Full Article

20 Likes

Arxiv

373

Image Credit: Arxiv

Data-driven system identification using quadratic embeddings of nonlinear dynamics

Researchers have introduced a data-driven method called QENDy (Quadratic Embedding of Nonlinear Dynamics) to learn quadratic representations of highly nonlinear dynamical systems.
QENDy aims to identify the governing equations by embedding the system into a higher-dimensional feature space where the dynamics become quadratic, similar to SINDy (Sparse Identification of Nonlinear Dynamics).
The method requires trajectory data, time derivatives for training data points, and a set of preselected basis functions, called a dictionary. Its effectiveness was demonstrated through benchmark problems and comparison with SINDy and deep learning.
The study also analyzed the convergence of QENDy and SINDy in the infinite data limit, highlighting similarities and differences, and compared the quadratic embedding with linearization techniques based on the Koopman operator.

Read Full Article

22 Likes

Arxiv

130

Image Credit: Arxiv

Algorithmic contiguity from low-degree conjecture and applications in correlated random graphs

The paper presents evidence of computational hardness for two problems based on a strengthened low-degree conjecture in random graphs.
The study addresses the matching recovery problem in sparse correlated Erdős-Rényi graphs and the detection problem between correlated sparse stochastic block models and independent stochastic block models.
The research utilizes 'algorithmic contiguity' to establish bounds on low-degree advantage between probability measures, enabling reductions between different inference tasks.
The findings offer insights into computational complexity in correlated random graphs and provide a framework for addressing related inference problems.

Read Full Article

7 Likes

Arxiv

207

Image Credit: Arxiv

Bandit-Based Prompt Design Strategy Selection Improves Prompt Optimizers

Prompt optimization aims to enhance the performance of large language models through the discovery of effective prompts.
The Autonomous Prompt Engineering Toolbox (APET) has integrated various prompt design strategies into the optimization process.
A new method called Optimizing Prompts with sTrategy Selection (OPTS) introduces explicit selection mechanisms for prompt design, including a Thompson sampling-based approach.
Experiments optimizing prompts for Llama-3-8B-Instruct and GPT-4o mini LLMs show that the selection of prompt design strategies improves performance, with the Thompson sampling-based mechanism yielding the best results.

Read Full Article

12 Likes

Arxiv

Image Credit: Arxiv

Enhancing Distributional Robustness in Principal Component Analysis by Wasserstein Distances

Researchers propose a method to enhance distributional robustness in Principal Component Analysis by using Wasserstein distances.
They introduce a distributionally robust optimization model for PCA to handle uncertainty in probability distribution.
The proposed approach involves a smoothing manifold proximal gradient algorithm to tackle the challenging nonsmooth optimization problem.
Numerical experiments confirm the efficacy and scalability of the algorithm, emphasizing the importance of the DRO model for PCA.

Read Full Article

5 Likes

Arxiv

Image Credit: Arxiv

Multiaccuracy and Multicalibration via Proxy Groups

As the use of predictive machine learning algorithms increases, ensuring fairness across sensitive groups is crucial.
Proxy-sensitive attributes are proposed as a solution for enforcing fairness in the absence of complete sensitive group information.
This work explores the use of proxy-sensitive attributes for multiaccuracy and multicalibration, providing bounds on fairness violations and demonstrating mitigation strategies.
Experiments on real-world datasets show that approximate multiaccuracy and multicalibration can be achieved even when sensitive group data is missing or incomplete.

Read Full Article

For uninterrupted reading, download the app