menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

ML News

source image

Arxiv

3d

read

208

img
dot

Image Credit: Arxiv

Atmospheric model-trained machine learning selection and classification of ultracool TY dwarfs

  • A new machine learning framework has been developed for detecting and classifying late-T and Y dwarfs, which represent the coolest and lowest-mass population of brown dwarfs.
  • The framework was trained using synthetic photometry from atmospheric models, creating a dataset larger than any empirical set of >T6 UCDs, enabling classification of spectral types for these ultracool dwarfs.
  • Validation results showed high performance on both synthetic and empirical datasets, with object classification metrics exceeding 99% accuracy and an average spectral type precision within 0.35 +/- 0.37 subtypes.
  • Application of the model around Pisces and the UKIDSS UDS field led to the discovery of a previously uncatalogued T8.2 candidate, showcasing the effectiveness of this model-trained approach in finding faint, late-type UCDs from photometric catalogs.

Read Full Article

like

12 Likes

source image

Arxiv

3d

read

152

img
dot

Image Credit: Arxiv

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

  • GLM-4.1V-Thinking is a vision-language model (VLM) that aims to advance multimodal reasoning.
  • The model utilizes Reinforcement Learning with Curriculum Sampling (RLCS) to enhance its capabilities across various tasks.
  • GLM-4.1V-9B-Thinking, an open-source version, achieves state-of-the-art performance on multiple benchmarks.
  • The model surpasses similar-sized models and even outperforms larger models on various tasks like STEM reasoning and long document understanding.

Read Full Article

like

9 Likes

source image

Arxiv

3d

read

148

img
dot

Image Credit: Arxiv

Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs

  • A new study presents the Junk DNA Hypothesis, focusing on the pre-trained weights of large language models like GPT-3.
  • The hypothesis challenges the belief that pruning small weights in LLMs does not affect performance, suggesting that these weights actually encode vital information for challenging tasks.
  • Removing these seemingly insignificant weights can lead to an irreversible loss of knowledge and performance decline in difficult tasks, even with continued training.
  • Quantization as a compression method does not exhibit the same effect as weight pruning in exposing task difficulty information according to the study. Extensive experiments support the Junk DNA Hypothesis.

Read Full Article

like

8 Likes

source image

Arxiv

3d

read

108

img
dot

Image Credit: Arxiv

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

  • Large Language Models (LLMs) are known for their high performance but face challenges in practical deployment due to their large size.
  • Efforts have been made to apply traditional network pruning techniques to LLMs to reduce their size without impacting performance.
  • A new pruning methodology called Outlier Weighed Layerwise sparsity (OWL) has been introduced, which considers non-uniform layerwise sparsity ratios based on outlier ratios within each layer.
  • Empirical evaluations show that OWL outperforms previous methods, achieving significant performance gains and faster inference speeds at high sparsity levels.

Read Full Article

like

6 Likes

source image

Arxiv

3d

read

364

img
dot

Image Credit: Arxiv

Identifying the Truth of Global Model: A Generic Solution to Defend Against Byzantine and Backdoor Attacks in Federated Learning (full version)

  • Federated Learning (FL) allows organizations to train machine learning models collectively without sharing raw data, but it can be vulnerable to attacks like Byzantine and backdoor attacks.
  • A new solution called FedTruth has been proposed to defend against malicious model updates in FL by estimating a 'ground-truth model update' without the need for a benign root dataset or assumptions on data distribution.
  • FedTruth considers contributions from all benign clients and employs dynamic aggregation weights to reduce the impact of poisoned model updates, making it effective against Byzantine and backdoor attacks in large-scale FL systems.
  • The proposed FedTruth solution aims to enhance the security of federated learning by addressing vulnerabilities to model poisoning attacks without relying on specific data assumptions or requiring access to a benign root dataset.

Read Full Article

like

21 Likes

source image

Arxiv

3d

read

292

img
dot

Image Credit: Arxiv

Soft Dice Confidence: A Near-Optimal Confidence Estimator for Selective Prediction in Semantic Segmentation

  • Selective prediction in semantic segmentation involves the use of a confidence score function to allow models to abstain from offering unreliable predictions.
  • A new confidence score function, Soft Dice Confidence (SDC), is proposed for binary semantic segmentation, aligning directly with the Dice coefficient metric without needing tuning or additional held-out data.
  • The SDC is shown to be near optimal under conditional independence, with upper and lower bounds established on its performance compared to the ideal confidence score function.
  • Experiments on various datasets validate the effectiveness of SDC, surpassing all prior confidence estimators without the requirement of extra data, making it a robust and efficient tool for selective prediction in semantic segmentation.

Read Full Article

like

17 Likes

source image

Arxiv

3d

read

296

img
dot

Image Credit: Arxiv

Fully Differentiable Lagrangian Convolutional Neural Network for Physics-Informed Precipitation Nowcasting

  • This paper introduces a convolutional neural network model called LUPIN for precipitation nowcasting that integrates data-driven learning with physics-based domain knowledge.
  • LUPIN utilizes a Lagrangian Double U-Net architecture with components for generating motion fields, extrapolation, and capturing precipitation evolution.
  • The model is fully differentiable and GPU-accelerated, enabling end-to-end training and inference with a data-driven Lagrangian coordinate system transformation.
  • Evaluation results show that LUPIN performs comparably or better than existing AI models in extreme event scenarios, demonstrating the potential of Lagrangian machine learning approaches.

Read Full Article

like

17 Likes

source image

Arxiv

3d

read

92

img
dot

Image Credit: Arxiv

The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes

  • The general-utility Markov decision processes (GUMDPs) framework extends the traditional MDPs by incorporating objective functions dependent on state-action pair visitation frequency induced by a policy.
  • This study analyzes the impact of the number of trials, representing randomly sampled trajectories, in infinite-horizon GUMDPs, revealing its significance in contrast to standard MDPs.
  • The research shows that the number of trials is crucial in infinite-horizon GUMDPs, where the expected policy performance is influenced by the number of trials.
  • Policy evaluation under discounted and average GUMDPs is investigated, presenting bounds on the discrepancy between finite and infinite trials formulations and empirical results supporting the findings.

Read Full Article

like

5 Likes

source image

Arxiv

3d

read

216

img
dot

Image Credit: Arxiv

Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning

  • Sliding Puzzles Gym (SPGym) is a benchmark created to evaluate visual representation learning in reinforcement learning (RL) agents by transforming the classic 8-tile puzzle into a visual RL task.
  • SPGym allows researchers to isolate and scale the visual representation challenge independently of other learning components by controlling representation learning complexity through adjustable grid sizes and image pools, while maintaining fixed environment dynamics.
  • Experiments with model-free and model-based RL algorithms using SPGym reveal limitations in handling visual diversity, with all algorithms showing performance degradation as the pool of possible images increases.
  • The study highlights the need for improved visual representation learning techniques in RL and positions SPGym as a valuable tool for advancing robust and generalizable decision-making systems.

Read Full Article

like

13 Likes

source image

Arxiv

3d

read

208

img
dot

Image Credit: Arxiv

EvoPress: Accurate Dynamic Model Compression via Evolutionary Search

  • Research on large language model compression has focused on methods like quantization, sparsification, and structured pruning to reduce computational costs.
  • A new approach called EvoPress introduces dynamic, non-uniform compression methods that adjust compression levels per-block or per-layer to minimize accuracy loss while meeting a global compression threshold.
  • EvoPress uses an evolutionary framework to identify optimal compression profiles efficiently, challenging the assumption that compression error is independent across layers in language models.
  • The EvoPress framework achieves state-of-the-art results in dynamic compression of various models like Llama, Mistral, and Phi through techniques such as structural pruning, sparsity, and quantization with dynamic bitwidths.

Read Full Article

like

12 Likes

source image

Arxiv

3d

read

280

img
dot

Image Credit: Arxiv

Integrating Dual Prototypes for Task-Wise Adaption in Pre-Trained Model-Based Class-Incremental Learning

  • A new method called Dual Prototype network for Task-wise Adaption (DPTA) is proposed for Class-Incremental Learning (CIL) using pre-trained models (PTM).
  • DPTA aims to address the challenge of catastrophic forgetting when fine-tuning PTMs on downstream incremental tasks by introducing adapter modules for each task to improve model adaption.
  • The DPTA method utilizes dual prototypes to enhance the prediction process by enabling test-time adapter selection and utilizing augmented prototypes to improve class separability.
  • Experiments on benchmark datasets have shown that DPTA outperforms existing methods in CIL, and the code for DPTA is available on GitHub for further exploration.

Read Full Article

like

16 Likes

source image

Arxiv

3d

read

332

img
dot

Image Credit: Arxiv

STONet: A neural operator for modeling solute transport in micro-cracked reservoirs

  • Researchers introduce a new neural operator, STONet, to model contaminant transport in micro-cracked porous media efficiently.
  • STONet architecture includes a DeepONet structure enriched with a transformer-based multi-head attention mechanism, enhancing performance without added computational overhead.
  • The model integrates different networks to encode properties effectively, predict concentration field changes accurately, and achieves relative errors below 1% compared to FEM simulations.
  • STONet's computational efficiency allows for rapid assessment of subsurface contamination risks and optimization of environmental remediation strategies.

Read Full Article

like

19 Likes

source image

Arxiv

3d

read

372

img
dot

Image Credit: Arxiv

UFGraphFR: Graph Federation Recommendation System based on User Text description features

  • Federated recommendation systems aim to balance user privacy and recommendation accuracy by utilizing distributed collaborative learning.
  • Existing federated recommendation methods often overlook user relationships, limiting recommendation effectiveness.
  • UFGraphFR proposes a personalized federated recommendation framework that constructs a user graph based on clients' local text features.
  • Experimental results show UFGraphFR achieves comparable recommendation accuracy to centralized approaches while maintaining user privacy.

Read Full Article

like

22 Likes

source image

Arxiv

3d

read

44

img
dot

Image Credit: Arxiv

A novel Trunk Branch-net PINN for flow and heat transfer prediction in porous medium

  • A novel Trunk-Branch (TB)-net physics-informed neural network (PINN) architecture has been developed to solve complex problems in porous mediums.
  • The TB-net PINN incorporates trunk and branch nets to capture global and local features, aiming to address forward flow, forward heat transfer, inverse heat transfer, and transfer learning problems.
  • The architecture uses a Fully-connected Neural Network (FNN) as the trunk net and separate FNNs as branch nets, with automatic differentiation for partial derivatives of outputs, considering various physical loss.
  • The TB-net PINN architecture demonstrated effectiveness, flexibility, and potential for practical engineering applications by solving forward problems and showcasing resource reuse in transfer learning.

Read Full Article

like

2 Likes

source image

Arxiv

3d

read

60

img
dot

Image Credit: Arxiv

Uncertainty Quantification of Wind Gust Predictions in the Northeast United States: An Evidential Neural Network and Explainable Artificial Intelligence Approach

  • Machine learning algorithms are being used to reduce bias in wind gust predictions, but they still struggle with accurately predicting high gusts.
  • A new approach called evidential neural network (ENN) is introduced to address the issue of uncertainty quantification (UQ) in gust predictions by leveraging atmospheric variables from the Weather Research and Forecasting (WRF) model.
  • Explainable AI techniques identified key features contributing to higher uncertainty in gust predictions, which were found to be strongly correlated with storm intensity and spatial gust gradients.
  • ENN demonstrated a 47% reduction in RMSE compared to WRF, allowing for the construction of gust prediction intervals without the need for an ensemble, and successfully capturing at least 95% of observed gusts at 179 out of 266 stations.

Read Full Article

like

3 Likes

For uninterrupted reading, download the app