menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

ML News

source image

Arxiv

10h

read

160

img
dot

Image Credit: Arxiv

Amortized Inference of Causal Models via Conditional Fixed-Point Iterations

  • Structural Causal Models (SCMs) help reason about interventions and support out-of-distribution generalization in scientific discovery.
  • Learning SCMs from observed data is challenging, typically necessitating a separate model for each dataset.
  • This work introduces amortized inference of SCMs by training a single model on multiple datasets from different SCMs.
  • A transformer-based architecture is used for learning dataset embeddings, followed by extending the Fixed-Point Approach (FiP) for SCM inference based on dataset embeddings.
  • The proposed method enables the generation of observational and interventional data from new SCMs during inference without parameter updates.
  • Empirical results demonstrate the performance of the amortized procedure against baselines, showing competitive results on in and out-of-distribution problems and outperforming them with limited data.

Read Full Article

like

9 Likes

source image

Arxiv

10h

read

339

img
dot

Image Credit: Arxiv

Revisiting the Equivalence of Bayesian Neural Networks and Gaussian Processes: On the Importance of Learning Activations

  • Gaussian Processes (GPs) are useful for modeling uncertainty with function-space priors, while Bayesian Neural Networks (BNNs) are more scalable but lack some GP advantages.
  • Efforts have been made to make BNNs behave like GPs, but previous solutions have limitations.
  • A study shows that using trainable activations is essential to map GP priors effectively to wide BNNs.
  • The closed-form 2-Wasserstein distance is used for efficient optimization of reparameterized priors and activations.
  • The method introduces trainable periodic activations for global stationarity and functional priors conditioned on GP hyperparameters for efficient model selection.
  • Empirical results demonstrate that the proposed method outperforms existing approaches and matches heuristic methods with stronger theoretical foundations.

Read Full Article

like

20 Likes

source image

Arxiv

10h

read

342

img
dot

Image Credit: Arxiv

CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis

  • Integrating multimodal Electronic Health Records (EHR) data has potential for predicting clinical outcomes.
  • Previous work focused on temporal interactions within samples and fusion of information, overlooking critical temporal patterns across patients.
  • Identifying temporal patterns like abnormal vital signs and corresponding textual descriptions is crucial.
  • A Cross-Modal Temporal Pattern Discovery (CTPD) framework is introduced to extract cross-modal temporal patterns efficiently.
  • CTPD uses shared initial temporal pattern representations and slot attention to generate temporal semantic embeddings.
  • A contrastive-based TPNCE loss is introduced for cross-modal alignment in learned patterns, along with two reconstruction losses.
  • Evaluations on 48-hour in-hospital mortality and 24-hour phenotype classification tasks using the MIMIC-III database highlight the superiority of the method.

Read Full Article

like

20 Likes

source image

Arxiv

10h

read

39

img
dot

Image Credit: Arxiv

WaKA: Data Attribution using K-Nearest Neighbors and Membership Privacy Principles

  • Researchers introduce WaKA (Wasserstein K-nearest-neighbors Attribution), an attribution method that combines principles from LiRA and k-nearest neighbors classifiers.
  • WaKA measures the contribution of individual data points to a model's loss distribution without needing to sample subsets of the training set.
  • It can be used as a membership inference attack (MIA) to assess privacy risks or for privacy influence measurement and data valuation.
  • WaKA bridges the gap between data attribution and MIA by distinguishing a data point's value from its privacy risk.
  • Self-attribution values in WaKA have a stronger correlation with attack success rates than a point's contribution to model generalization.
  • WaKA performs closely to LiRA in MIA tasks on k-NN classifiers but with better computational efficiency.
  • It demonstrates greater robustness than Shapley Values for data minimization tasks on imbalanced datasets.

Read Full Article

like

2 Likes

source image

Arxiv

10h

read

85

img
dot

Image Credit: Arxiv

Network Dynamics-Based Framework for Understanding Deep Neural Networks

  • A theoretical framework is proposed to analyze learning dynamics in deep neural networks using dynamical systems theory.
  • The framework introduces order-preserving and non-order-preserving transformations at the neuron level to redefine linearity and nonlinearity.
  • Different transformation modes lead to unique weight vector organization, information extraction, and learning phases.
  • Transitions between phases, including phenomena like grokking, can occur during training.
  • The concept of attraction basins in sample and weight spaces is introduced to characterize generalization and structural stability.
  • Metrics based on neuron transformation modes and attraction basins help analyze learning model performance.
  • Hyperparameters like depth, width, learning rate, and batch size influence these metrics for model optimization.

Read Full Article

like

5 Likes

source image

Arxiv

10h

read

124

img
dot

Image Credit: Arxiv

Generalized Lie Symmetries in Physics-Informed Neural Operators

  • Physics-informed neural operators (PINOs) are effective for learning solution operators of PDEs.
  • Recent research has shown that incorporating Lie point symmetry information can boost the training efficiency of PINOs.
  • Techniques like data, architecture, and loss augmentation are used to integrate Lie point symmetries.
  • However, traditional point symmetries can sometimes offer no training signal, limiting their effectiveness in certain problems.
  • To overcome this limitation, a novel loss augmentation strategy is proposed in this work.
  • The strategy leverages evolutionary representatives of point symmetries, a type of generalized symmetries of the underlying PDE.
  • Generalized symmetries provide a more extensive set of generators than standard symmetries, offering a more informative training signal.
  • By using evolutionary representatives, the performance of neural operators is enhanced, leading to better data efficiency and accuracy in training.

Read Full Article

like

7 Likes

source image

Arxiv

10h

read

249

img
dot

Image Credit: Arxiv

PDE-Controller: LLMs for Autoformalization and Reasoning of PDEs

  • PDE-Controller is a framework that enables large language models (LLMs) to control systems governed by partial differential equations (PDEs).
  • The framework transforms informal natural language instructions into formal specifications, executes reasoning, and improves PDE control utility.
  • PDE-Controller includes datasets, math-reasoning models, and evaluation metrics, requiring significant effort for development.
  • The framework outperforms open source and GPT models in reasoning, autoformalization, and program synthesis, achieving up to a 62% improvement in utility gain for PDE control.
  • By combining language generation with PDE systems, PDE-Controller shows the potential of LLMs in addressing scientific and engineering challenges.
  • All data, model checkpoints, and code related to PDE-Controller are available at https://pde-controller.github.io/.

Read Full Article

like

15 Likes

source image

Arxiv

10h

read

121

img
dot

Image Credit: Arxiv

Anomaly Detection via Autoencoder Composite Features and NCE

  • Unsupervised anomaly detection is a challenging task utilizing autoencoders and generative models.
  • Autoencoders are often used to model normal data distribution and identify anomalies by high reconstruction error.
  • The proposed approach involves a decoupled training using both an autoencoder and a likelihood model with noise contrastive estimation (NCE).
  • NCE estimates a probability density function for anomaly scoring in the joint space of the autoencoder's latent representation and reconstruction quality features.
  • To improve NCE's false negative rate, reconstruction features are systematically varied during training to optimize the noise distribution.
  • Experimental assessments on multiple benchmark datasets show that the proposed approach matches the performance of leading anomaly detection algorithms.

Read Full Article

like

7 Likes

source image

Arxiv

10h

read

353

img
dot

Image Credit: Arxiv

Bias Detection via Maximum Subgroup Discrepancy

  • Bias evaluation is crucial for ensuring AI systems are trustworthy by assessing data quality and AI outputs.
  • Classical metrics like Total Variation and Wasserstein distances have high sample complexities, leading to limitations in many practical scenarios.
  • A new distance metric called Maximum Subgroup Discrepancy (MSD) is proposed in this paper.
  • MSD measures closeness between two distributions based on low discrepancies across feature subgroups.
  • Despite an exponential number of subgroups, the sample complexity of MSD remains linear in the number of features, making it practical for real-world applications.
  • An algorithm based on Mixed-integer optimization (MIO) is introduced for evaluating the distance.
  • MSD is easily interpretable, facilitating bias identification and correction.
  • The paper introduces a general bias detection framework, MSDD distances, in which MSD fits well.
  • Empirical evaluations comparing MSD with other metrics demonstrate its effectiveness on real-world datasets.

Read Full Article

like

21 Likes

source image

Arxiv

10h

read

85

img
dot

Image Credit: Arxiv

Discovering Physics Laws of Dynamical Systems via Invariant Function Learning

  • Researchers have developed a method called Disentanglement of Invariant Functions (DIF) to learn the underlying laws of dynamical systems governed by ordinary differential equations.
  • The key challenge was to discover intrinsic dynamics across multiple environments while avoiding environment-specific mechanisms.
  • The method addresses complex environments where changes extend beyond function coefficients to entirely different function forms.
  • For example, it can detect the natural motion of an ideal pendulum like alpha^2 sin(theta_t) by observing pendulum dynamics in varied environments.
  • The problem is formulated as an invariant function learning task grounded in causal analysis.
  • A causal graph and an encoder-decoder hypernetwork are designed in the DIF method to disentangle invariant functions from environment-specific dynamics.
  • The method ensures the independence between extracted invariant functions and environments through an information-based principle.
  • Quantitative comparisons with meta-learning and invariant learning baselines on three ODE systems have shown the effectiveness and efficiency of the DIF method.
  • Symbolic regression explanation results demonstrate the framework's ability to uncover intrinsic laws.
  • The code for the method has been made available as part of the AIRS library on GitHub.

Read Full Article

like

5 Likes

source image

Arxiv

10h

read

321

img
dot

Image Credit: Arxiv

QuEST: Stable Training of LLMs with 1-Bit Weights and Activations

  • One approach to reducing the costs of large language models (LLMs) is through the use of quantized or sparse representations for training or deployment.
  • While post-training compression methods are popular, there is interest in obtaining more accurate compressed models by directly training over such representations with Quantization-Aware Training (QAT).
  • A recent study suggested that models can be trained using QAT at 8-bits weights and activations while maintaining accuracy.
  • A new method called QuEST advances the state-of-the-art by demonstrating optimality at 4-bits and stable convergence as low as 1-bit weights and activations.
  • QuEST achieves this through accurate and fast quantization of weights and activations using Hadamard normalization and MSE-optimal fitting, and a trust gradient estimator to minimize error between noisy and full-precision gradients.
  • Experiments show that QuEST induces stable scaling laws across various precisions and can be extended to sparse representations.
  • GPU kernel support is provided to efficiently execute models produced by QuEST.

Read Full Article

like

19 Likes

source image

Arxiv

10h

read

242

img
dot

Image Credit: Arxiv

Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts

  • A study explores hierarchical meta-learning in dynamical system reconstruction (DSR) using a Mixture of Experts (MoE) approach.
  • While conventional MoEs faced challenges in hierarchical DSR due to slow updates and conflicted routing, a new method called MixER is introduced.
  • MixER, a sparse top-1 MoE layer, incorporates a custom gating update algorithm based on $K$-means and least squares for more effective training and scalability.
  • Experiments validate MixER's efficiency and scalability in handling systems with up to ten parametric ordinary differential equations.
  • However, MixER falls short compared to existing meta-learners in scenarios with abundant data, especially when each expert processes only a fraction of a dataset with closely related data points.
  • Analysis with synthetic and neuroscientific time series data indicates that MixER's performance is influenced by the presence of hierarchical structure in the data.

Read Full Article

like

14 Likes

source image

Arxiv

10h

read

107

img
dot

Image Credit: Arxiv

On the Importance of Embedding Norms in Self-Supervised Learning

  • Self-supervised learning (SSL) has become essential in machine learning for training data representations without a supervised signal.
  • Most SSL methods use the cosine similarity between embedding vectors, embedding data effectively on a hypersphere.
  • Recent works suggest that embedding norms play a role in SSL, contrary to previous beliefs.
  • This paper resolves the contradiction and establishes the role of embedding norms in SSL training.
  • Theoretical analysis, simulations, and experiments show that embedding norms affect SSL convergence rates and network confidence.
  • Smaller embedding norms correspond to unexpected samples in the network.
  • Manipulating embedding norms can significantly impact convergence speed in SSL.
  • The study highlights the importance of embedding norms in understanding and optimizing network behavior in SSL.

Read Full Article

like

6 Likes

source image

Arxiv

10h

read

142

img
dot

Image Credit: Arxiv

NestQuant: Nested Lattice Quantization for Matrix Products and LLMs

  • NestQuant is a new Post-training quantization (PTQ) method for efficient deployment of large language models (LLMs), based on self-similar nested lattices.
  • NestQuant is identified to be information-theoretically optimal for low-precision matrix multiplication, using a practical low-complexity version based on Gosset lattice.
  • It is a drop-in quantizer for any matrix multiplication step in LLMs, like self-attention, MLP, etc.
  • NestQuant quantizes weights, KV-cache, and activations of Llama-3-8B model to 4 bits, achieving a perplexity of 6.6 on wikitext2.
  • This results in more than a 55% reduction in perplexity gap compared to unquantized models, outperforming state-of-the-art methods like Metas SpinQuant, OstQuant, and QuaRot.
  • Tests on larger models (up to 70B) and various LLM evaluation benchmarks consistently show NestQuant's superiority.

Read Full Article

like

8 Likes

source image

Arxiv

10h

read

214

img
dot

Image Credit: Arxiv

A Closer Look at TabPFN v2: Understanding Its Strengths and Extending Its Capabilities

  • TabPFN v2, a transformer-based model for tabular datasets, excels in in-context learning performance across various datasets.
  • The model eliminates the need for dataset-specific attribute embeddings to address heterogeneity by inferring attribute relationships effectively.
  • TabPFN v2 can function as a feature extractor, creating a highly separable feature space for accurate predictions.
  • The model's limitations in handling high-dimensional, many-category, and large-scale tasks can be mitigated through a test-time divide-and-conquer strategy.
  • This study provides insights into TabPFN v2's success and proposes strategies to extend its usability for future tabular foundation models.

Read Full Article

like

12 Likes

For uninterrupted reading, download the app