menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

ML News

source image

Arxiv

8h

read

92

img
dot

Image Credit: Arxiv

Implicit Language Models are RNNs: Balancing Parallelization and Expressivity

  • State-space models (SSMs) and transformers are widely used in language modeling but have lower computational complexity than recurrent neural networks (RNNs), limiting their expressivity.
  • RNNs lack parallelization during training, leading to a trade-off between parallelization and expressivity.
  • A new approach proposes implicit SSMs that iterate a transformation until convergence to a fixed point, implementing non-linear state-transitions of RNNs.
  • Approximate fixed-point convergence is found to be sufficient, allowing a scalable training curriculum with partial parallelization.
  • The implicit SSMs exhibit superior state-tracking capabilities on regular languages compared to transformers and SSMs.
  • Implicit SSMs are scaled to natural language reasoning tasks and pretraining large-scale language models with up to 1.3B parameters on 207B tokens, the largest implicit model trained to date.
  • The implicit models outperform explicit counterparts on standard benchmarks.
  • Code for the implicit language models is available on GitHub.

Read Full Article

like

5 Likes

source image

Arxiv

8h

read

49

img
dot

Image Credit: Arxiv

Quality over Quantity: Boosting Data Efficiency Through Ensembled Multimodal Data Curation

  • The paper focuses on enhancing data efficiency by curating web-crawl datasets through an advanced approach named EcoDatum.
  • EcoDatum addresses challenges related to unstructured and heterogeneous datasets, overcoming biases and the exclusion of relevant data often seen in traditional curation methods.
  • The method incorporates quality-guided deduplication for balanced feature distributions and integrates various data curation operators within a weak supervision ensemble framework.
  • Automated optimization is used to effectively score each data point, leading to improved curation quality and efficiency compared to existing techniques.
  • EcoDatum outperforms state-of-the-art methods and ranked 1st on the DataComp leaderboard, achieving an average performance score of 0.182 across 38 evaluation datasets.
  • The approach demonstrated a 28% improvement over the DataComp baseline method, showcasing its effectiveness in enhancing dataset curation and model training efficiency.

Read Full Article

like

2 Likes

source image

Arxiv

8h

read

67

img
dot

Image Credit: Arxiv

A hierarchical approach for assessing the vulnerability of tree-based classification models to membership inference attack

  • Machine learning models can unintentionally reveal confidential data, making them susceptible to membership inference attacks (MIA).
  • New methods were introduced to assess the vulnerability of tree-based models efficiently against MIA: analyzing hyperparameter choices before training and examining the model structure after training.
  • These new approaches do not guarantee model safety but help in reducing the number of models needing extensive MIA evaluation through a hierarchical filtering process.
  • Consistent disclosure risk rankings for hyperparameter combinations across datasets allow the identification of high-risk models before training.
  • Analyze hyperparameters to avoid risky configurations during model training.
  • Simple human-interpretable rules can be developed to identify potentially high-risk models before training.
  • Structural metrics can serve as indicators of MIA vulnerability after model training.
  • Hyperparameter-based risk prediction rules show high accuracy in predicting vulnerable combinations without requiring model training.
  • Model accuracy does not necessarily correspond to privacy risk, indicating room for optimizing models for performance and privacy simultaneously.

Read Full Article

like

4 Likes

source image

Arxiv

8h

read

60

img
dot

Image Credit: Arxiv

GraphThought: Graph Combinatorial Optimization with Thought Generation

  • Graph combinatorial optimization (GCO) problems are crucial in domains like logistics and bioinformatics.
  • Large language models (LLMs) are exploring new avenues for structured reasoning in GCO but have limitations with complex tasks.
  • The Optimal Thoughts Design (OTD) problem is formalized to assist in producing high-quality intermediate reasoning steps.
  • GraphThought is a new framework that generates effective reasoning sequences using either forward search or backward reasoning.
  • Llama-GT, an 8B-parameter model developed through fine-tuning LLMs on structured thought sequences, excels in GCO tasks.
  • It outperforms larger models like DeepSeek-V3, showcasing enhanced performance without the need for increased model scale.

Read Full Article

like

3 Likes

source image

Arxiv

8h

read

311

img
dot

Image Credit: Arxiv

From Features to Graphs: Exploring Graph Structures and Pairwise Interactions via GNNs

  • This study focuses on exploring the importance of pairwise interactions in constructing feature graphs for Graph Neural Networks (GNNs).
  • The research leverages existing GNN models and tools to analyze the relationship between feature graph structures and their effectiveness in modeling interactions.
  • Experiments on synthesized datasets reveal that edges connecting interacting features play a crucial role in enabling GNNs to effectively model feature interactions.
  • Including non-interaction edges in the feature graph can introduce noise and degrade model performance.
  • The study also introduces theoretical support for selecting sparse feature graphs based on the Minimum Description Length (MDL) principle.
  • Sparse feature graphs, retaining only necessary interaction edges, are shown to provide a more efficient and interpretable representation compared to complete graphs, in line with Occam's Razor.
  • The findings offer theoretical insights and practical guidelines for designing feature graphs to enhance the performance and interpretability of GNN models.

Read Full Article

like

18 Likes

source image

Arxiv

8h

read

187

img
dot

Image Credit: Arxiv

ETS: Efficient Tree Search for Inference-Time Scaling

  • Test-time compute scaling aims to improve model accuracy by using additional computation at inference time.
  • Efficient Tree Search (ETS) is proposed to address challenges in search methods for inference-time scaling.
  • ETS uses a process reward model to generate and score potential candidates during the search process.
  • Increasing diversity in trajectories during tree search promotes more exploration but consumes more memory.
  • ETS promotes KV sharing by pruning redundant trajectories while maintaining necessary diversity.
  • A linear programming cost model in ETS penalizes the number of retained nodes to promote KV cache sharing.
  • ETS achieves a 1.8$ imes$ reduction in average KV cache size during search, leading to 1.4$ imes$ increased throughput.
  • ETS demonstrates improved performance relative to prior state-of-the-art methods with minimal accuracy degradation.
  • No custom kernel implementation is required for ETS, and the code is available on GitHub.

Read Full Article

like

11 Likes

source image

Arxiv

8h

read

261

img
dot

Image Credit: Arxiv

Decoding for Punctured Convolutional and Turbo Codes: A Deep Learning Solution for Protocols Compliance

  • Neural network-based decoding methods are being explored to enhance error correction performance.
  • Traditional approaches struggle with the complexities of punctured codes, variable code rates, and protocol compatibility.
  • A unified LSTM-based decoding architecture is proposed to address these challenges for punctured convolutional and Turbo codes.
  • The method integrates puncturing patterns into the network to adapt to varying code rates and ensures robustness through balanced training.
  • Extensive simulations show that the proposed approach outperforms conventional decoding techniques in noise and fading channels.
  • The results highlight LSTM-based decoding as a promising solution for advanced communication systems.
  • Paper Title: Decoding for Punctured Convolutional and Turbo Codes: A Deep Learning Solution for Protocols Compliance

Read Full Article

like

15 Likes

source image

Arxiv

8h

read

187

img
dot

Image Credit: Arxiv

Generative Uncertainty in Diffusion Models

  • Diffusion models have been instrumental in generative modeling advancements.
  • Even though current models generate high-quality average samples, individual samples may be of low quality.
  • Detecting low-quality samples without human intervention is still a complex task.
  • A Bayesian framework is proposed to estimate generative uncertainty of synthetic samples.
  • The framework aims to make Bayesian inference practical for large generative models.
  • A new semantic likelihood is introduced to handle challenges in high-dimensional sample spaces.
  • Experiments demonstrate that the proposed generative uncertainty method effectively identifies poor samples.
  • The Bayesian framework can be applied post-hoc to pretrained diffusion models using the Laplace approximation.
  • Simple yet effective techniques are suggested to reduce computational overhead during sampling.

Read Full Article

like

11 Likes

source image

Arxiv

8h

read

343

img
dot

Image Credit: Arxiv

Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update

  • The research focuses on addressing heavy-tailed noise in stochastic linear bandits.
  • Existing strategies like truncation and median-of-means are limited in applicability due to specific noise assumptions or bandit structures.
  • A recent work introduced a soft truncation method using adaptive Huber regression but faced computational challenges.
  • A new 'one-pass' algorithm based on online mirror descent reduces per-round computational costs significantly, offering near-optimal regret.
  • The method updates using only current data at each round, improving efficiency.
  • Per-round computational cost decreases from O(t*log T) to O(1).
  • The algorithm achieves a regret order of d * T^((1-ε)/(2*(1+ε))) * sqrt(Σ_{t=1}^T ν_t^2) for a dimension d and moment of reward ν_t.

Read Full Article

like

20 Likes

source image

Arxiv

8h

read

109

img
dot

Image Credit: Arxiv

Evolutionary Prediction Games

  • Predictive algorithms serving users can lead to varying prediction quality.
  • Users' responses to accurate predictions can create feedback loops affecting the model and user population.
  • Evolutionary prediction games framework introduced for modeling feedback loops using evolutionary game theory.
  • Analysis shows competition and competitive exclusion in ideal settings with unlimited resources, while coexistence is possible in realistic constraints.
  • Stable coexistence and mutualistic symbiosis between user groups feasible under constraints like finite data and limited compute.
  • Mechanisms to sustain coexistence presented and empirical evidence provided to support the findings.

Read Full Article

like

6 Likes

source image

Arxiv

8h

read

148

img
dot

Image Credit: Arxiv

Pretraining Generative Flow Networks with Inexpensive Rewards for Molecular Graph Generation

  • Generative Flow Networks (GFlowNets) are used for molecular graph generation.
  • Previous methods restricted exploration by using predefined molecular fragments.
  • Atomic GFlowNets (A-GFNs) introduce a new generative model using individual atoms as building blocks for drug-like chemical space exploration.
  • Unsupervised pre-training with drug-like molecule datasets is proposed for A-GFNs, focusing on molecular descriptors like drug-likeliness and synthetic accessibility scores as rewards.
  • These rewards guide A-GFNs towards regions of chemical space with desired pharmacological properties.
  • Goal-conditioned finetuning helps adapt A-GFNs for specific target properties.
  • Pretraining A-GFN on a subset of ZINC dataset shows effectiveness in drug design tasks compared to baseline methods.
  • The code for A-GFN is available at https://github.com/diamondspark/AGFN.

Read Full Article

like

8 Likes

source image

Arxiv

8h

read

53

img
dot

Image Credit: Arxiv

Graph-Dependent Regret Bounds in Multi-Armed Bandits with Interference

  • Study examines multi-armed bandits with network interference affecting rewards based on local graph structure.
  • Proposed algorithm leverages graph characteristics to reduce regret in exponentially large action spaces.
  • A graph-dependent upper bound on cumulative regret is achieved, surpassing previous research.
  • Lower bounds for bandits with diverse network interference types are established using graph properties.
  • Algorithm's optimality is demonstrated for dense and sparse graphs with near-optimal performance.
  • In cases of unknown interference graph, algorithm variant is Pareto optimal, leading in all scenarios.
  • Theoretical findings are supported by numerical experiments, illustrating superior performance over standard methods.

Read Full Article

like

3 Likes

source image

Arxiv

8h

read

339

img
dot

Image Credit: Arxiv

Large Scale Multi-Task Bayesian Optimization with Large Language Models

  • Researchers propose a novel approach using large language models (LLMs) for multi-task Bayesian optimization.
  • The goal is to leverage experience from optimizing existing tasks to improve the efficiency of optimizing new tasks.
  • Existing approaches with multi-task Gaussian processes or deep kernel transfer show limited performance improvement at scale.
  • The new approach scales to about 1500 tasks and involves fine-tuning an LLM on high-quality solutions from Bayesian optimization.
  • The fine-tuned LLM generates initialization points for new task searches, creating a feedback loop.
  • This method was evaluated in database query optimization and antimicrobial peptide design domains.
  • Results show that the LLM's initializations gradually improve, leading to better optimization performance.
  • The LLM eventually generates solutions for new tasks with fewer oracle calls, surpassing solutions from Bayesian optimization starting from scratch.
  • The approach forms a positive feedback loop, enhancing optimization efficiency over time.
  • The research demonstrates the potential of LLMs in improving multi-task Bayesian optimization.
  • The study showcases the benefits of leveraging LLMs to enhance optimization processes across various domains.
  • The feedback loop with LLMs shows promise in accelerating optimization and achieving superior results with fewer resources.
  • This research contributes to advancing optimization techniques through the integration of large language models.
  • The study underscores the importance of leveraging existing optimization trajectories to enhance future task optimization.
  • The proposed method aims to improve optimization efficiency and performance by utilizing large language models.
  • Overall, the approach presents a promising strategy for enhancing multi-task Bayesian optimization using LLMs.

Read Full Article

like

20 Likes

source image

Arxiv

8h

read

28

img
dot

Image Credit: Arxiv

Learning richness modulates equality reasoning in neural networks

  • Equality reasoning is a common and abstract concept that can be evaluated regardless of the objects involved.
  • Same-different (SD) tasks are extensively studied to understand abstract reasoning in humans and animals.
  • Neural networks have shown proficiency in abstract reasoning, sparking interest in studying equality reasoning in these models.
  • However, there is little consensus on conclusions regarding equality reasoning in neural networks.
  • A theory of equality reasoning in multi-layer perceptrons (MLP) is developed to clarify the principles in learning SD tasks.
  • Two types of behaviors, conceptual and perceptual, are identified in relation to equality reasoning.
  • Conceptual behavior is task-specific, efficient in learning, and less affected by irrelevant details.
  • Perceptual behavior is highly sensitive to irrelevant details and requires exhaustive training to learn the task.
  • The behavior of an MLP in equality reasoning tasks is driven by learning richness, categorized as rich or lazy regimes.
  • Rich-regime MLPs exhibit conceptual behavior, while lazy-regime MLPs exhibit perceptual behavior.
  • Experimental validation in vision SD tasks shows that rich feature learning enhances success by promoting conceptual behavior.
  • Learning richness in feature learning is identified as a critical factor influencing equality reasoning.
  • The study suggests that equality reasoning in humans and animals could also depend on learning richness in neural circuits.

Read Full Article

like

1 Like

source image

Arxiv

8h

read

28

img
dot

Image Credit: Arxiv

Unveiling the Role of Randomization in Multiclass Adversarial Classification: Insights from Graph Theory

  • Randomization is being explored to boost adversarial robustness in machine learning models, with a focus on multiclass classification.
  • Current theoretical analysis has mainly concentrated on binary classification, leaving gaps in understanding multiclass scenarios.
  • A study draws from graph theory to analyze how randomization impacts adversarial risk minimization in multiclass settings.
  • The analysis centers on discrete data distributions, mapping adversarial risk minimization to set packing problems.
  • Three structural conditions on the data distribution's support are identified as crucial for randomization to enhance robustness.
  • Switching from deterministic to randomized solutions in certain data distributions notably decreases optimal adversarial risk.
  • The research underscores the significant role of randomization in fortifying multiclass classification against adversarial attacks.

Read Full Article

like

1 Like

For uninterrupted reading, download the app