Machine Learning (ML) Latest News and Trending articles from all top sources only on Techminis

A naukri.com initiative

New

Home

ML News

Arxiv

138

Image Credit: Arxiv

DANCE: Deep Learning-Assisted Analysis of Protein Sequences Using Chaos Enhanced Kaleidoscopic Images

Cancer is a complex disease involving uncontrolled cell growth, with T cell receptors (TCRs) playing a crucial role in recognizing antigens, including those related to cancer.
Advancements in sequencing technologies have allowed for detailed profiling of TCR repertoires, leading to the discovery of potent anti-cancer TCRs and the development of TCR-based immunotherapies.
Analyzing T-cell protein sequences presents challenges due to their shorter lengths, necessitating efficient representations.
A proposed solution involves generating chaos-enhanced kaleidoscopic images from protein sequences using Chaos Game Representation (CGR).
The Deep Learning Assisted Analysis of Protein Sequences Using Chaos Enhanced Kaleidoscopic Images (DANCE) method enables visualization of protein sequences by applying chaos game rules around a central point.
The DANCE method is utilized to classify TCR protein sequences associated with specific cancer cells, leveraging the immune response of TCRs against cancer.
TCR sequences are transformed into images via the DANCE method, and deep-learning vision models are employed for classification, linking visual patterns in the images with underlying protein properties.
By combining CGR-based image generation with deep learning classification, this study introduces new possibilities in protein analysis.
The research project focuses on improving analysis techniques for T-cell protein sequences, particularly in the context of cancer immunity.
The DANCE method provides a unique visual representation of protein sequences, aiding in the exploration of TCR properties and their interactions with cancer cells.
The study highlights the significance of innovative approaches, such as chaos-enhanced kaleidoscopic images, in enhancing protein sequence analysis and classification.
Efficient representation of TCR sequences through image-based approaches allows for detailed analysis and classification using deep learning methods.
The integration of Chaos Game Representation and deep learning techniques offers a promising avenue for studying the relationship between visual patterns and protein properties.
TCR-based immunotherapies may benefit from the insights gained through the DANCE method's classification of TCR protein sequences.
The proposed methodology showcases the potential of combining visual data representation with advanced analytical tools in protein sequence analysis.
In conclusion, the DANCE approach using chaos-enhanced kaleidoscopic images presents a novel and effective strategy for analyzing and classifying T-cell protein sequences with implications for cancer research and immunotherapy development.

Read Full Article

8 Likes

Arxiv

335

Image Credit: Arxiv

Reevaluating Meta-Learning Optimization Algorithms Through Contextual Self-Modulation

Contextual Self-Modulation (CSM) is a regularization mechanism for Neural Context Flows (NCFs) known for powerful meta-learning on physical systems.
CSM has limitations across different modalities and in high-data regimes.
Two extensions have been introduced in this work: iCSM, which expands CSM to infinite-dimensional variations, and StochasticNCF, which provides a low-cost approximation of meta-gradient updates.
The extensions were tested on tasks such as dynamical systems, computer vision challenges, and curve fitting problems.
Incorporating higher-order Taylor expansions showed that they do not necessarily improve generalization.
CSM can be integrated into other meta-learning frameworks with FlashCAVIA.
The study emphasizes the benefits of CSM for meta-learning and out-of-distribution tasks, particularly suited for physical systems.
An open-source library for integrating self-modulation into contextual meta-learning workflows is available at https://github.com/ddrous/self-mod.

Read Full Article

20 Likes

Arxiv

323

Image Credit: Arxiv

TimeDART: A Diffusion Autoregressive Transformer for Self-Supervised Time Series Representation

Self-supervised learning in time series analysis is gaining attention for reducing the need for labeled data and improving downstream tasks.
Current methods struggle to capture both long-term dynamic evolution and subtle local patterns effectively.
A new model called TimeDART is introduced, which unifies two generative paradigms for learning transferable representations.
TimeDART uses a causal Transformer encoder and patch-based embedding strategy to capture evolving trends from left to right.
The model also employs a denoising diffusion process to capture fine-grained local patterns through forward diffusion and reverse denoising.
Optimization of the model is done in an autoregressive manner, effectively combining global and local sequence features.
Extensive experiments on public datasets show that TimeDART outperforms existing methods in time series forecasting and classification tasks.
The code for TimeDART is available at https://github.com/Melmaphother/TimeDART.

Read Full Article

19 Likes

Arxiv

177

Image Credit: Arxiv

Amortized Inference of Causal Models via Conditional Fixed-Point Iterations

Structural Causal Models (SCMs) help reason about interventions and support out-of-distribution generalization in scientific discovery.
Learning SCMs from observed data is challenging, typically necessitating a separate model for each dataset.
This work introduces amortized inference of SCMs by training a single model on multiple datasets from different SCMs.
A transformer-based architecture is used for learning dataset embeddings, followed by extending the Fixed-Point Approach (FiP) for SCM inference based on dataset embeddings.
The proposed method enables the generation of observational and interventional data from new SCMs during inference without parameter updates.
Empirical results demonstrate the performance of the amortized procedure against baselines, showing competitive results on in and out-of-distribution problems and outperforming them with limited data.

Read Full Article

10 Likes

Discover more

Arxiv

375

Image Credit: Arxiv

Revisiting the Equivalence of Bayesian Neural Networks and Gaussian Processes: On the Importance of Learning Activations

Gaussian Processes (GPs) are useful for modeling uncertainty with function-space priors, while Bayesian Neural Networks (BNNs) are more scalable but lack some GP advantages.
Efforts have been made to make BNNs behave like GPs, but previous solutions have limitations.
A study shows that using trainable activations is essential to map GP priors effectively to wide BNNs.
The closed-form 2-Wasserstein distance is used for efficient optimization of reparameterized priors and activations.
The method introduces trainable periodic activations for global stationarity and functional priors conditioned on GP hyperparameters for efficient model selection.
Empirical results demonstrate that the proposed method outperforms existing approaches and matches heuristic methods with stronger theoretical foundations.

Read Full Article

22 Likes

Arxiv

379

Image Credit: Arxiv

CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis

Integrating multimodal Electronic Health Records (EHR) data has potential for predicting clinical outcomes.
Previous work focused on temporal interactions within samples and fusion of information, overlooking critical temporal patterns across patients.
Identifying temporal patterns like abnormal vital signs and corresponding textual descriptions is crucial.
A Cross-Modal Temporal Pattern Discovery (CTPD) framework is introduced to extract cross-modal temporal patterns efficiently.
CTPD uses shared initial temporal pattern representations and slot attention to generate temporal semantic embeddings.
A contrastive-based TPNCE loss is introduced for cross-modal alignment in learned patterns, along with two reconstruction losses.
Evaluations on 48-hour in-hospital mortality and 24-hour phenotype classification tasks using the MIMIC-III database highlight the superiority of the method.

Read Full Article

22 Likes

Arxiv

Image Credit: Arxiv

WaKA: Data Attribution using K-Nearest Neighbors and Membership Privacy Principles

Researchers introduce WaKA (Wasserstein K-nearest-neighbors Attribution), an attribution method that combines principles from LiRA and k-nearest neighbors classifiers.
WaKA measures the contribution of individual data points to a model's loss distribution without needing to sample subsets of the training set.
It can be used as a membership inference attack (MIA) to assess privacy risks or for privacy influence measurement and data valuation.
WaKA bridges the gap between data attribution and MIA by distinguishing a data point's value from its privacy risk.
Self-attribution values in WaKA have a stronger correlation with attack success rates than a point's contribution to model generalization.
WaKA performs closely to LiRA in MIA tasks on k-NN classifiers but with better computational efficiency.
It demonstrates greater robustness than Shapley Values for data minimization tasks on imbalanced datasets.

Read Full Article

2 Likes

Arxiv

Image Credit: Arxiv

Network Dynamics-Based Framework for Understanding Deep Neural Networks

A theoretical framework is proposed to analyze learning dynamics in deep neural networks using dynamical systems theory.
The framework introduces order-preserving and non-order-preserving transformations at the neuron level to redefine linearity and nonlinearity.
Different transformation modes lead to unique weight vector organization, information extraction, and learning phases.
Transitions between phases, including phenomena like grokking, can occur during training.
The concept of attraction basins in sample and weight spaces is introduced to characterize generalization and structural stability.
Metrics based on neuron transformation modes and attraction basins help analyze learning model performance.
Hyperparameters like depth, width, learning rate, and batch size influence these metrics for model optimization.

Read Full Article

5 Likes

Arxiv

138

Image Credit: Arxiv

Generalized Lie Symmetries in Physics-Informed Neural Operators

Physics-informed neural operators (PINOs) are effective for learning solution operators of PDEs.
Recent research has shown that incorporating Lie point symmetry information can boost the training efficiency of PINOs.
Techniques like data, architecture, and loss augmentation are used to integrate Lie point symmetries.
However, traditional point symmetries can sometimes offer no training signal, limiting their effectiveness in certain problems.
To overcome this limitation, a novel loss augmentation strategy is proposed in this work.
The strategy leverages evolutionary representatives of point symmetries, a type of generalized symmetries of the underlying PDE.
Generalized symmetries provide a more extensive set of generators than standard symmetries, offering a more informative training signal.
By using evolutionary representatives, the performance of neural operators is enhanced, leading to better data efficiency and accuracy in training.

Read Full Article

8 Likes

Arxiv

276

Image Credit: Arxiv

PDE-Controller: LLMs for Autoformalization and Reasoning of PDEs

PDE-Controller is a framework that enables large language models (LLMs) to control systems governed by partial differential equations (PDEs).
The framework transforms informal natural language instructions into formal specifications, executes reasoning, and improves PDE control utility.
PDE-Controller includes datasets, math-reasoning models, and evaluation metrics, requiring significant effort for development.
The framework outperforms open source and GPT models in reasoning, autoformalization, and program synthesis, achieving up to a 62% improvement in utility gain for PDE control.
By combining language generation with PDE systems, PDE-Controller shows the potential of LLMs in addressing scientific and engineering challenges.
All data, model checkpoints, and code related to PDE-Controller are available at https://pde-controller.github.io/.

Read Full Article

16 Likes

Arxiv

134

Image Credit: Arxiv

Anomaly Detection via Autoencoder Composite Features and NCE

Unsupervised anomaly detection is a challenging task utilizing autoencoders and generative models.
Autoencoders are often used to model normal data distribution and identify anomalies by high reconstruction error.
The proposed approach involves a decoupled training using both an autoencoder and a likelihood model with noise contrastive estimation (NCE).
NCE estimates a probability density function for anomaly scoring in the joint space of the autoencoder's latent representation and reconstruction quality features.
To improve NCE's false negative rate, reconstruction features are systematically varied during training to optimize the noise distribution.
Experimental assessments on multiple benchmark datasets show that the proposed approach matches the performance of leading anomaly detection algorithms.

Read Full Article

8 Likes

Arxiv

390

Image Credit: Arxiv

Bias Detection via Maximum Subgroup Discrepancy

Bias evaluation is crucial for ensuring AI systems are trustworthy by assessing data quality and AI outputs.
Classical metrics like Total Variation and Wasserstein distances have high sample complexities, leading to limitations in many practical scenarios.
A new distance metric called Maximum Subgroup Discrepancy (MSD) is proposed in this paper.
MSD measures closeness between two distributions based on low discrepancies across feature subgroups.
Despite an exponential number of subgroups, the sample complexity of MSD remains linear in the number of features, making it practical for real-world applications.
An algorithm based on Mixed-integer optimization (MIO) is introduced for evaluating the distance.
MSD is easily interpretable, facilitating bias identification and correction.
The paper introduces a general bias detection framework, MSDD distances, in which MSD fits well.
Empirical evaluations comparing MSD with other metrics demonstrate its effectiveness on real-world datasets.

Read Full Article

23 Likes

Arxiv

Image Credit: Arxiv

Discovering Physics Laws of Dynamical Systems via Invariant Function Learning

Researchers have developed a method called Disentanglement of Invariant Functions (DIF) to learn the underlying laws of dynamical systems governed by ordinary differential equations.
The key challenge was to discover intrinsic dynamics across multiple environments while avoiding environment-specific mechanisms.
The method addresses complex environments where changes extend beyond function coefficients to entirely different function forms.
For example, it can detect the natural motion of an ideal pendulum like alpha^2 sin(theta_t) by observing pendulum dynamics in varied environments.
The problem is formulated as an invariant function learning task grounded in causal analysis.
A causal graph and an encoder-decoder hypernetwork are designed in the DIF method to disentangle invariant functions from environment-specific dynamics.
The method ensures the independence between extracted invariant functions and environments through an information-based principle.
Quantitative comparisons with meta-learning and invariant learning baselines on three ODE systems have shown the effectiveness and efficiency of the DIF method.
Symbolic regression explanation results demonstrate the framework's ability to uncover intrinsic laws.
The code for the method has been made available as part of the AIRS library on GitHub.

Read Full Article

5 Likes

Arxiv

355

Image Credit: Arxiv

QuEST: Stable Training of LLMs with 1-Bit Weights and Activations

One approach to reducing the costs of large language models (LLMs) is through the use of quantized or sparse representations for training or deployment.
While post-training compression methods are popular, there is interest in obtaining more accurate compressed models by directly training over such representations with Quantization-Aware Training (QAT).
A recent study suggested that models can be trained using QAT at 8-bits weights and activations while maintaining accuracy.
A new method called QuEST advances the state-of-the-art by demonstrating optimality at 4-bits and stable convergence as low as 1-bit weights and activations.
QuEST achieves this through accurate and fast quantization of weights and activations using Hadamard normalization and MSE-optimal fitting, and a trust gradient estimator to minimize error between noisy and full-precision gradients.
Experiments show that QuEST induces stable scaling laws across various precisions and can be extended to sparse representations.
GPU kernel support is provided to efficiently execute models produced by QuEST.

Read Full Article

21 Likes

Arxiv

268

Image Credit: Arxiv

Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts

A study explores hierarchical meta-learning in dynamical system reconstruction (DSR) using a Mixture of Experts (MoE) approach.
While conventional MoEs faced challenges in hierarchical DSR due to slow updates and conflicted routing, a new method called MixER is introduced.
MixER, a sparse top-1 MoE layer, incorporates a custom gating update algorithm based on $K$-means and least squares for more effective training and scalability.
Experiments validate MixER's efficiency and scalability in handling systems with up to ten parametric ordinary differential equations.
However, MixER falls short compared to existing meta-learners in scenarios with abundant data, especially when each expert processes only a fraction of a dataset with closely related data points.
Analysis with synthetic and neuroscientific time series data indicates that MixER's performance is influenced by the presence of hierarchical structure in the data.

Read Full Article

16 Likes

For uninterrupted reading, download the app