menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

ML News

source image

Marktechpost

4d

read

48

img
dot

How Much Do Language Models Really Memorize? Meta’s New Framework Defines Model Capacity at the Bit Level

  • Modern language models are under scrutiny for their memorization behavior, questioning if they memorize training data meaningfully.
  • Existing techniques like data extraction and privacy mechanisms struggle to differentiate between memorization and generalization.
  • Researchers propose a novel method to measure model capacity by separating memorization into unintended and generalization components.
  • They found GPT language models have about 3.6 bits-per-parameter capacity and developed scaling laws for membership inference.
  • Experiments involved training GPT-2 models with various configurations and sizes on synthetic and real-text datasets.
  • Insights include 3.5 to 3.6 bits per parameter, double descent phenomena, and precision impact on model storage capacity.
  • The study disentangles memorization and generalization effects, showing increased unintended memorization with more parameters.
  • Membership inference accuracy decreases with larger datasets, but scaling laws are consistent for models up to 1.5B parameters.
  • The framework enhances understanding of how transformer models encode data and distinguishes between memorization and generalization.

Read Full Article

like

2 Likes

source image

Medium

4d

read

300

img
dot

Image Credit: Medium

️ Predicting Concrete’s Compressive Strength with Machine Learning

  • A project utilizing big data and machine learning aims to predict concrete compressive strength in 28 seconds, complementing traditional lab testing.
  • Data from Kaggle is used, incorporating features like water-to-cement ratio and weights of aggregates to enhance accuracy.
  • Visualizations like heatmaps and pair plots are used to assess correlations between input features and compressive strength.
  • Data is split into training, cross-validation, and test sets for model evaluation using SGD, XGBoost, and ANN algorithms.
  • Hyperparameter tuning is done for each model, with XGBoost showing an accuracy of 91% to 92% on cross-validation and test sets.
  • ANN, designed after biological neurons, performs well but with slightly lower accuracy compared to XGBoost at 86.8% on the test set.
  • An early stopping setting at 100 patience and 200 epochs is used to address overfitting in the ANN model.
  • XGBoost algorithm demonstrated superior performance in this concrete compressive strength prediction project.
  • A web app deploying the ANN model for interactive use is provided for further exploration.

Read Full Article

like

18 Likes

source image

Arxiv

4d

read

296

img
dot

Image Credit: Arxiv

KVmix: Gradient-Based Layer Importance-Aware Mixed-Precision Quantization for KV Cache

  • A new method named KVmix is proposed for mixed-precision quantization of Key-Value (KV) Cache to address high memory demands in Large Language Models (LLMs) inference.
  • KVmix utilizes gradient-based importance analysis to allocate layer-specific bit-widths, prioritizing important layers while aggressively quantizing less critical ones.
  • It introduces a dynamic long-context optimization strategy to balance accuracy and efficiency by keeping full-precision KV pairs for recent pivotal tokens and compressing older ones.
  • KVmix achieves near-lossless inference performance on LLMs like Llama and Mistral with significant memory compression and speedup in inference throughput.

Read Full Article

like

17 Likes

source image

Arxiv

4d

read

203

img
dot

Image Credit: Arxiv

Bi-level Unbalanced Optimal Transport for Partial Domain Adaptation

  • Partial domain adaptation (PDA) problem involves aligning cross-domain samples while distinguishing outlier classes for accurate knowledge transfer.
  • Proposed approach, Bi-level Unbalanced Optimal Transport (BUOT), aims to address biases in the widely used weighting framework by incorporating sample-wise and class-wise relations in a unified transport framework.
  • BUOT model introduces a cooperation mechanism between sample-level and class-level transport for effective knowledge transfer and outlier identification.
  • Experiments on benchmark datasets confirm the competitiveness and efficiency of the BUOT model in tackling the challenges of partial domain adaptation.

Read Full Article

like

12 Likes

source image

Arxiv

4d

read

198

img
dot

Image Credit: Arxiv

FlowBERT: Prompt-tuned BERT for variable flow field prediction

  • A new study proposes a flow field prediction framework based on knowledge transfer from a large language model (LLM) to address computational costs in computational fluid dynamics methods and limited cross-condition transfer capabilities of existing deep learning models.
  • The framework integrates Proper Orthogonal Decomposition (POD) dimensionality reduction with fine-tuning strategies for pretrained LLM, enabling compressed representation of flow field features and encoding system dynamics in state space.
  • Fluid dynamics-oriented text templates are designed to improve predictive performance by providing enriched contextual semantic information, leading to outperformance of conventional Transformer models in few-shot learning scenarios and exceptional generalization across different inflow conditions and airfoil geometries.
  • The approach significantly reduces prediction time to seconds while maintaining over 90% accuracy compared to traditional Navier-Stokes equation solvers, potentially impacting aerodynamic optimization, flow control, and other engineering applications.

Read Full Article

like

11 Likes

source image

Arxiv

4d

read

194

img
dot

Image Credit: Arxiv

Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining

  • A new study introduces Modality-Balancing Preference Optimization (MBPO) to address modality imbalance in Large Multimodal Models (LMMs).
  • MBPO generates hard negatives to counter biases in Large Language Model (LLM) backbones and incorporates online responses with verified rewards using Group Relative Policy Optimization (GRPO).
  • The method aims to improve reasoning capabilities in LMMs and reduce hallucinations by balancing language prior biases over visual inputs.
  • Experiments show that MBPO enhances performance on vision-language tasks and effectively combats modality imbalance in LMMs.

Read Full Article

like

11 Likes

source image

Arxiv

4d

read

174

img
dot

Image Credit: Arxiv

Recipes for Pre-training LLMs with MXFP8

  • Precision scaling with fewer bits is being used in pre-training LLMs to improve GPU efficiency without sacrificing accuracy.
  • NVIDIA's latest Blackwell GPUs employ Microscaling (MX) formats, combining narrow floating-point data types with per-block scaling factors for quantizing tensors.
  • While MX-formats offer improved numeric stability, careful usage is required to ensure successful convergence of LLMs on large datasets.
  • The study proposes an improved rounding mode using round-to-infinity to compute scaling factors, allowing successful pre-training in MXFP8 for an 8B model on 15T tokens.

Read Full Article

like

10 Likes

source image

Arxiv

4d

read

227

img
dot

Image Credit: Arxiv

ST-GraphNet: A Spatio-Temporal Graph Neural Network for Understanding and Predicting Automated Vehicle Crash Severity

  • Researchers introduce ST-GraphNet, a spatio-temporal graph neural network framework for understanding and predicting automated vehicle crash severity.
  • ST-GraphNet utilizes fine-grained and region-aggregated spatial graphs constructed from real-world AV-related crash reports from Texas.
  • The framework employs multimodal data enriched with semantic, spatial, and temporal attributes, including textual embeddings from crash narratives.
  • ST-GraphNet achieves a test accuracy of 97.74% using a Dynamic Spatio-Temporal GCN on a coarse-grained spatial graph, demonstrating superior performance compared to fine-grained models.

Read Full Article

like

13 Likes

source image

Arxiv

4d

read

215

img
dot

Image Credit: Arxiv

STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation

  • STAMImputer is a Spatio-Temporal Attention Mixture of Experts network designed for traffic data imputation.
  • It addresses challenges in extracting features from block-wise missing data scenarios and handling distribution shifts for nonstationary traffic data.
  • The network incorporates a Mixture of Experts framework to capture latent spatio-temporal features and uses a Low-rank guided Sampling Graph ATtention mechanism for spatial feature propagation.
  • Extensive experiments on four traffic datasets show that STAMImputer outperforms existing state-of-the-art approaches in traffic data imputation.

Read Full Article

like

12 Likes

source image

Arxiv

4d

read

105

img
dot

Image Credit: Arxiv

Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques

  • Large language models have transformed natural language processing, yet supervised fine-tuning (SFT) remains computationally intensive.
  • Capabilities acquired through supervised fine-tuning can be approximated by a base transformer model using inference-time techniques like in-context learning (ICL), without altering model parameters.
  • The paper extends these results to practical scenarios with finite context lengths and partial dataset access, providing insights into resource-efficient deployment of large language models.
  • For text generation tasks and linear classification, certain dataset sizes are identified to approximate fine-tuned behavior within specified error margins, offering a theoretical foundation for real-world applications.

Read Full Article

like

6 Likes

source image

Arxiv

4d

read

97

img
dot

Image Credit: Arxiv

FairDICE: Fairness-Driven Offline Multi-Objective Reinforcement Learning

  • FairDICE is a new framework for Fairness-Driven Offline Multi-Objective Reinforcement Learning.
  • It aims to optimize policies in the presence of conflicting objectives by directly optimizing nonlinear welfare objectives.
  • FairDICE uses distribution correction estimation to account for welfare maximization and distributional regularization.
  • It shows strong fairness-aware performance across multiple offline benchmarks compared to existing baselines.

Read Full Article

like

5 Likes

source image

Arxiv

4d

read

93

img
dot

Image Credit: Arxiv

Lite-RVFL: A Lightweight Random Vector Functional-Link Neural Network for Learning Under Concept Drift

  • Lite-RVFL is a lightweight, fast, and efficient neural network designed to handle concept drift without requiring drift detection or model retraining.
  • It introduces a novel objective function that assigns weights exponentially increasing to new samples, allowing timely adaptation to new data.
  • The theoretical analysis supports the feasibility of Lite-RVFL's objective function for drift adaptation, and an efficient incremental update rule is derived.
  • Experimental results on a safety assessment task demonstrate Lite-RVFL's efficiency, effectiveness in adapting to drift, and ability to capture temporal patterns.

Read Full Article

like

5 Likes

source image

Arxiv

4d

read

385

img
dot

Image Credit: Arxiv

Info-Coevolution: An Efficient Framework for Data Model Coevolution

  • Info-Coevolution is a new framework proposed for efficient dataset construction and training in machine learning.
  • It addresses the challenge of determining if new data needs annotation given the existing model and data.
  • The framework selectively annotates and integrates online data to enhance datasets efficiently without bias.
  • Info-Coevolution has shown promising results in reducing annotation and training costs for datasets like ImageNet-1K.

Read Full Article

like

23 Likes

source image

Arxiv

4d

read

69

img
dot

Image Credit: Arxiv

Benchmarking Pre-Trained Time Series Models for Electricity Price Forecasting

  • Accurate electricity price forecasting is crucial for power trading on the spot market.
  • Benchmarking was done on several pre-trained models for electricity price forecasting.
  • TSFMs like Chronos-Bolt and Time-MoE performed well but the biseasonal MSTL model stood out with consistent performance.
  • The study used 2024 day-ahead auction electricity prices from Germany, France, the Netherlands, Austria, and Belgium for evaluation.

Read Full Article

like

4 Likes

source image

Arxiv

4d

read

341

img
dot

Image Credit: Arxiv

Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning

  • Large language models have impressive reasoning capabilities but suffer from inefficiencies due to verbose outputs.
  • Most reinforcement learning works focus on accuracy rather than reasoning efficiency.
  • The proposed Bingo framework uses significance-aware and dynamic length rewards to boost efficient reasoning.
  • Experiments show that Bingo improves accuracy and efficiency, outperforming other reward baselines.

Read Full Article

like

20 Likes

For uninterrupted reading, download the app