Machine Learning (ML) Latest News and Trending articles from all top sources only on Techminis

A naukri.com initiative

New

Home

ML News

Arxiv

268

Image Credit: Arxiv

Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts

A study explores hierarchical meta-learning in dynamical system reconstruction (DSR) using a Mixture of Experts (MoE) approach.
While conventional MoEs faced challenges in hierarchical DSR due to slow updates and conflicted routing, a new method called MixER is introduced.
MixER, a sparse top-1 MoE layer, incorporates a custom gating update algorithm based on $K$-means and least squares for more effective training and scalability.
Experiments validate MixER's efficiency and scalability in handling systems with up to ten parametric ordinary differential equations.
However, MixER falls short compared to existing meta-learners in scenarios with abundant data, especially when each expert processes only a fraction of a dataset with closely related data points.
Analysis with synthetic and neuroscientific time series data indicates that MixER's performance is influenced by the presence of hierarchical structure in the data.

Read Full Article

16 Likes

Arxiv

118

Image Credit: Arxiv

On the Importance of Embedding Norms in Self-Supervised Learning

Self-supervised learning (SSL) has become essential in machine learning for training data representations without a supervised signal.
Most SSL methods use the cosine similarity between embedding vectors, embedding data effectively on a hypersphere.
Recent works suggest that embedding norms play a role in SSL, contrary to previous beliefs.
This paper resolves the contradiction and establishes the role of embedding norms in SSL training.
Theoretical analysis, simulations, and experiments show that embedding norms affect SSL convergence rates and network confidence.
Smaller embedding norms correspond to unexpected samples in the network.
Manipulating embedding norms can significantly impact convergence speed in SSL.
The study highlights the importance of embedding norms in understanding and optimizing network behavior in SSL.

Read Full Article

7 Likes

Arxiv

158

Image Credit: Arxiv

NestQuant: Nested Lattice Quantization for Matrix Products and LLMs

NestQuant is a new Post-training quantization (PTQ) method for efficient deployment of large language models (LLMs), based on self-similar nested lattices.
NestQuant is identified to be information-theoretically optimal for low-precision matrix multiplication, using a practical low-complexity version based on Gosset lattice.
It is a drop-in quantizer for any matrix multiplication step in LLMs, like self-attention, MLP, etc.
NestQuant quantizes weights, KV-cache, and activations of Llama-3-8B model to 4 bits, achieving a perplexity of 6.6 on wikitext2.
This results in more than a 55% reduction in perplexity gap compared to unquantized models, outperforming state-of-the-art methods like Metas SpinQuant, OstQuant, and QuaRot.
Tests on larger models (up to 70B) and various LLM evaluation benchmarks consistently show NestQuant's superiority.

Read Full Article

9 Likes

Arxiv

237

Image Credit: Arxiv

A Closer Look at TabPFN v2: Understanding Its Strengths and Extending Its Capabilities

TabPFN v2, a transformer-based model for tabular datasets, excels in in-context learning performance across various datasets.
The model eliminates the need for dataset-specific attribute embeddings to address heterogeneity by inferring attribute relationships effectively.
TabPFN v2 can function as a feature extractor, creating a highly separable feature space for accurate predictions.
The model's limitations in handling high-dimensional, many-category, and large-scale tasks can be mitigated through a test-time divide-and-conquer strategy.
This study provides insights into TabPFN v2's success and proposes strategies to extend its usability for future tabular foundation models.

Read Full Article

14 Likes

Discover more

Arxiv

347

Image Credit: Arxiv

Mechanistic PDE Networks for Discovery of Governing Equations

Mechanistic PDE Networks is a model for discovering governing partial differential equations from data.
It represents spatiotemporal data as space-time dependent linear partial differential equations in neural network hidden representations.
The PDEs represented are solved and decoded for specific tasks, expressing spatiotemporal dynamics in data in neural network hidden space.
Solving the PDE representations in a compute and memory-efficient manner is a key challenge.
A native, GPU-capable, parallel, sparse, and differentiable multigrid solver is developed for linear PDEs within Mechanistic PDE Networks.
This solver acts as a module to handle linear PDEs efficiently.
The architecture can discover nonlinear PDEs in complex scenarios while being robust to noise, leveraging the PDE solver.
PDE discovery is validated on various equations including reaction-diffusion and Navier-Stokes equations.

Read Full Article

20 Likes

Arxiv

Image Credit: Arxiv

Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis

Recent advancements in Web AI agents have shown impressive abilities in handling complex web navigation tasks.
Web AI agents are found to be more vulnerable than standalone Large Language Models (LLMs) despite being based on the same safety-aligned models.
This vulnerability arises due to the increased flexibility of Web AI agents, potentially exposing them to a broader range of adversarial inputs.
A study aims to understand and address the factors contributing to the enhanced vulnerability of Web AI agents.
The differences between Web AI agents and standalone LLMs, along with complex signals, contribute to their increased vulnerability.
Simple evaluation metrics like success rate may not adequately capture the nuances that make Web AI agents more vulnerable.
The study proposes a component-level analysis and a detailed evaluation framework to address these challenges.
Three critical factors amplifying the vulnerability of Web AI agents are identified: embedding user goals, multi-step action generation, and observational capabilities.
Enhancing security and robustness in AI agent design is crucial, as highlighted by the findings of this study.
Actionable insights are provided for developing targeted defense strategies to improve the security of Web AI agents.

Read Full Article

1 Like

Arxiv

308

Image Credit: Arxiv

FinTSBridge: A New Evaluation Suite for Real-world Financial Prediction with Advanced Time Series Models

Despite the focus on time series forecasting, challenges persist in applying models to financial asset pricing.
A new evaluation suite called FinTSBridge aims to bridge the gap between time series forecasting models and financial asset pricing.
Efforts included constructing financial datasets, testing over ten time series models, and introducing new metrics like msIC and msIR.
The suite evaluates models' performance in financial tasks to gauge their practical applicability in financial scenarios.
FinTSBridge is intended to offer valuable insights into the effectiveness and robustness of advanced forecasting models in financial domains.

Read Full Article

18 Likes

Arxiv

114

Image Credit: Arxiv

Whoever Started the Interference Should End It: Guiding Data-Free Model Merging via Task Vectors

Model merging aims to integrate task-specific expert models into a unified architecture while maintaining multi-task generalization capabilities.
Parameter interference between models often leads to reduced performance.
Resolving interference without extra data or computations during testing is a challenge.
The paper suggests minimizing interference by utilizing task vectors in the linear layer.
A method called WUDI-Merging is proposed, focusing on eliminating interference without additional data or rescaling coefficients.
Empirical evaluations across vision and language benchmarks show the effectiveness of the method in data-free model merging.
WUDI-Merging surpasses baseline methods by an average improvement of 10.9% and even outperforms mainstream test-time adaptation approaches by 3.3%.
The method exhibits superior performance while requiring minimal computing resources.
The code for WUDI-Merging will be made publicly available soon.

Read Full Article

6 Likes

Arxiv

Image Credit: Arxiv

Dataset Properties Shape the Success of Neuroimaging-Based Patient Stratification: A Benchmarking Analysis Across Clustering Algorithms

Neuroimaging-based patient stratification holds promise for precision neuropsychiatry.
Dataset characteristics such as cluster separation, size imbalance, noise, and disease-related effects influence clustering algorithm success.
Four widely used stratification algorithms were evaluated on synthetic brain-morphometry cohorts.
Data complexity was found to be more crucial than the choice of algorithm for successful stratification.
Well-separated clusters yielded high accuracy, while overlapping or unequal-sized clusters reduced accuracy.
SuStaIn had limitations in scaling, HYDRA's accuracy varied with data heterogeneity, SmileGAN and SurrealGAN detected patterns but did not assign discrete labels.
The study stresses the importance of dataset properties in shaping algorithm success and calls for realistic dataset distributions in algorithm development.

Read Full Article

1 Like

Arxiv

343

Image Credit: Arxiv

Towards Unified and Lossless Latent Space for 3D Molecular Latent Diffusion Modeling

3D molecule generation is vital for drug discovery and material science, requiring models to handle complex multi-modalities.
An important challenge is integrating modalities like atom types, chemical bonds, and 3D coordinates while maintaining SE(3) equivariance for 3D coordinates.
Existing methods often use separate latent spaces for different modalities, affecting training and sampling efficiency.
A Unified Variational Auto-Encoder for 3D Molecular Latent Diffusion Modeling (UAE-3D) is proposed to address this challenge.
UAE-3D compresses 3D molecules into a unified latent space with near-zero reconstruction error, simplifying handling of multi-modalities.
The unified latent space enables efficient latent diffusion modeling without the complexities of multi-modality handling.
The Diffusion Transformer, a molecular-inductive-bias-free diffusion model, is used for latent generation.
Extensive experiments on GEOM-Drugs and QM9 datasets show that UAE-3D sets new benchmarks in de novo and conditional 3D molecule generation.
On GEOM-Drugs, FCD reduction by 72.6% compared to the previous best result is achieved, with over 70% relative average improvements in geometric fidelity.

Read Full Article

20 Likes

Arxiv

292

Image Credit: Arxiv

Chem42: a Family of chemical Language Models for Target-aware Ligand Generation

Chem42 is a family of generative chemical Language Models designed to create novel ligands tailored to specific biological targets.
Most chemical Language Models fail to incorporate target-specific insights for de-novo ligand generation.
Chem42 integrates atomic-level interactions with inputs from Prot42, a protein Language Model, to improve molecular structure understanding and binding patterns.
The framework of Chem42 enables the creation of valid ligands with enhanced target specificity.
Evaluations show that Chem42 excels in chemical validity, target-aware design, and predicted binding affinity compared to existing approaches.
Chem42 could streamline the drug discovery process by reducing the search space for potential drug candidates.
The Chem42 models are available to the public at huggingface.co/inceptionai.

Read Full Article

17 Likes

Arxiv

Image Credit: Arxiv

RocketPPA: Code-Level Power, Performance, and Area Prediction via LLM and Mixture of Experts

RocketPPA is a new tool that predicts power, performance, and area (PPA) metrics directly at the code-level abstraction using HDL code as input.
It utilizes an LLM-based regression model that integrates a large language model (LLM) with a mixture-of-experts (MoE) architecture composed of multilayer perceptrons (MLPs).
The LLM interprets the input HDL code and uses its final hidden-layer representations to predict PPA metrics. Low-rank adaptation (LoRA) enables efficient LLM training.
RocketPPA includes an LLM-based HDL code repair framework to generate a synthesizable training dataset.
Experimental results show that RocketPPA significantly improves accuracy in PPA estimation compared to previous methods like Llama3-MetRex-8B.
At a 10% relative error threshold, RocketPPA enhances area prediction pass rate by 13.6%, delay by 9.4%, and power by 14.7%.
At a 20% threshold, RocketPPA improves area prediction by 9.6%, delay by 10.8%, and power by 18.5%.
RocketPPA achieves over 20x speedup compared to MetRex and 30x over MasterRTL in processing the test set.
RocketPPA's impact lies in potentially speeding up the hardware design process by providing accurate PPA estimations early on, reducing manual feature engineering overhead and time-consuming synthesis flows.

Read Full Article

3 Likes

Arxiv

324

Image Credit: Arxiv

Coil2Coil: Self-supervised MR image denoising using phased-array coil images

Denoising magnetic resonance images is crucial for improving low signal-to-noise ratio images.
Deep neural networks have shown promise for denoising, but most methods rely on supervised learning, which needs clean and noise-corrupted image pairs for training.
Acquiring training images, especially clean ones, is costly and time-consuming.
To address this, the Coil2Coil (C2C) method, a self-supervised denoising approach, has been proposed.
C2C does not require clean images or paired noise-corrupted images for training.
Instead, it uses multichannel data from phased-array coils to create training images.
C2C divides and combines multichannel coil images into input and label images and processes them for training using Noise2Noise (N2N) principles.
During testing, C2C can denoise coil-combined images like DICOM images, making it widely applicable.
In synthetic noise-added image evaluations, C2C outperformed other self-supervised methods and matched supervised methods in performance.
When denoising real DICOM images, C2C effectively removed noise without leaving residual errors.
The method is advantageous for clinical applications as it eliminates the need for additional scans for clean or noise-corrupted image pairs.

Read Full Article

19 Likes

Arxiv

280

Image Credit: Arxiv

Tractable hierarchies of convex relaxations for polynomial optimization on the nonnegative orthant

Researchers propose a hierarchy of semidefinite relaxations for polynomial optimization problems (POP) on the nonnegative orthant.
POP on a semialgebraic set in the nonnegative orthant can be converted to an equivalent form by squaring each variable, allowing for easier computations.
The proposed hierarchy is based on extending Pólya's Positivstellensatz by Dickinson-Povh, introducing even symmetry and factor width concepts.
A key feature of the new hierarchy is the ability to choose the maximal matrix size of each semidefinite relaxation arbitrarily.
The sequence of values obtained by the hierarchy converges to the optimal value of the original POP at a rate of O(ε^(-c)), provided the semialgebraic set has a nonempty interior.
The method is applied to tasks such as robustness certification of multi-layer neural networks and computation of positive maximal singular values.
Compared to the Moment-SOS hierarchy, the proposed method offers better bounds and significantly faster computation times, running several hundred times faster.

Read Full Article

16 Likes

Arxiv

Image Credit: Arxiv

Simulation-based Inference for High-dimensional Data using Surjective Sequential Neural Likelihood Estimation

Neural likelihood estimation methods for simulation-based inference face issues with high-dimensional data.
A new method called Surjective Sequential Neural Likelihood (SSNL) estimation is introduced for simulation-based inference.
SSNL utilizes a dimensionality-reducing surjective normalizing flow model as a surrogate likelihood function.
It enables computational inference via Markov chain Monte Carlo or variational Bayes methods.
SSNL eliminates the need for manual crafting of summary statistics for high-dimensional data inference.
The method is evaluated on various experiments and surpasses or matches state-of-the-art techniques.
SSNL proves to be a promising option for simulation-based inference on high-dimensional data sets.

Read Full Article

For uninterrupted reading, download the app