Machine Learning (ML) Latest News and Trending articles from all top sources only on Techminis

A naukri.com initiative

New

Home

ML News

Arxiv

Image Credit: Arxiv

Annotation-Free MIDI-to-Audio Synthesis via Concatenative Synthesis and Generative Refinement

Recent MIDI-to-audio synthesis methods using deep neural networks have been successful in generating high-quality, expressive instrumental tracks.
These methods usually require MIDI annotations for supervised training, which limits the diversity of instrument timbres and expression styles in the output.
CoSaRef is introduced as a MIDI-to-audio synthesis method that does not depend on MIDI-audio paired datasets.
CoSaRef involves two main steps: generating a synthetic audio track using concatenative synthesis from MIDI input and refining it using a diffusion-based deep generative model trained without MIDI annotations.
This method enhances the diversity of timbres and expression styles in the generated audio output.
CoSaRef also enables fine control over timbres and expression through sample selection and extra MIDI design, akin to traditional functions in digital audio workstations.
Experiments demonstrated that CoSaRef can produce realistic tracks while maintaining detailed timbre control via one-shot samples.
Despite not being trained with MIDI annotations, CoSaRef outperformed a state-of-the-art timbre-controllable method based on MIDI supervision in both objective and subjective evaluations.

Read Full Article

3 Likes

Arxiv

118

Image Credit: Arxiv

Minimax optimality of deep neural networks on dependent data via PAC-Bayes bounds

Schmidt-Hieber (2020) showed the minimax optimality of deep neural networks with ReLu activation for least-square regression estimation.
The paper extends these results by considering dependent data, removing the i.i.d. assumption.
Observations are now allowed to be a Markov chain with a non-null pseudo-spectral gap.
A more general class of machine learning problems, including least-square and logistic regression, is studied.
The study uses PAC-Bayes oracle inequalities and a version of Bernstein inequality by Paulin (2015) to derive upper bounds on estimation risk for a generalized Bayesian estimator.
For least-square regression, the bound matches Schmidt-Hieber's lower bound up to a logarithmic factor.
The paper establishes a lower bound for classification with logistic loss and proves the optimality of the proposed deep neural network estimator in a minimax sense.

Read Full Article

7 Likes

Arxiv

138

Image Credit: Arxiv

Code-Switching Curriculum Learning for Multilingual Transfer in LLMs

Large language models (LLMs) face performance drops after a few high-resource languages due to pre-training data imbalance.
Inspired by second language acquisition, code-switching curriculum learning (CSCL) is proposed for enhancing cross-lingual transfer in LLMs.
CSCL mimics human language learning stages through token-level and sentence-level code-switching as well as monolingual corpora training.
Using Qwen 2 model, CSCL shows significant gains in language transfer to Korean compared to monolingual pre-training methods.
Ablation studies confirm the effectiveness of both token- and sentence-level code-switching in enhancing cross-lingual transfer, amplified by curriculum learning.
The study extends to languages like Japanese and Indonesian using Gemma 2 and Phi 3.5 models, demonstrating improved language transfer.
CSCL helps mitigate spurious correlations between language resources and safety alignment, offering an efficient framework for equitable language transfer in LLMs.
CSCL proves effective in low-resource settings lacking high-quality, monolingual corpora for language transfer.

Read Full Article

8 Likes

Arxiv

110

Image Credit: Arxiv

Model Attribution and Detection of Synthetic Speech via Vocoder Fingerprints

Speech generation technology advancements raise concerns about potential misuse of synthetic speech signals.
The study addresses three key tasks: single-model attribution in an open-world scenario, model attribution in a closed-world scenario, and distinguishing synthetic from real speech.
The research uses standardized average residuals between audio signals and filtered versions as vocoder fingerprints for identification purposes.
The vocoder fingerprints prove to be effective in achieving over 99% average AUROC on LJSpeech and JSUT datasets for various tasks.
The study also demonstrates resilience to noise to a certain extent, as shown in the accompanying robustness study.

Read Full Article

6 Likes

Discover more

Arxiv

292

Image Credit: Arxiv

Spectral Image Tokenizer

Image tokenizers are essential for autoregressive transformer-based image generation, mapping images to sequences of discrete tokens.
The proposed spectral image tokenizer in this paper tokenizes the image spectrum obtained from a discrete wavelet transform.
Advantages of the spectral image tokenizer include leveraging the compressibility of natural images at high frequencies and enabling image reconstruction at different resolutions without retraining.
The tokenizer improves conditioning for next-token prediction compared to traditional approaches and enables partial decoding for coarse image reconstruction.
It also allows autoregressive models to be utilized for image upsampling, providing versatility in image manipulation tasks.
Evaluation of the tokenizer includes reconstruction metrics, multiscale image generation, text-guided image upsampling, and editing.

Read Full Article

17 Likes

Arxiv

Image Credit: Arxiv

ICONS: Influence Consensus for Vision-Language Data Selection

ICONS: Influence Consensus for Vision-Language Data Selection
Training vision-language models often relies on large mixtures of data spanning diverse tasks and domains. However, these mixtures can include redundant information, increasing computational costs without performance gains.
Effective data selection strategies are necessary to address these issues. Existing methods use task-agnostic heuristics or focus on optimizing single tasks, limiting their effectiveness in multitask settings.
This work introduces ICONS, a gradient-based approach for vision-language data selection. ICONS leverages training dynamics to estimate the influence of individual examples on validation performance and aggregates these estimates across tasks via majority voting.
By identifying data points consistently valuable across tasks, ICONS prioritizes examples driving overall performance. The method mitigates score calibration and outlier sensitivity issues, resulting in robust data selection for diverse multitask mixtures.
With only 20% of the data from LLaVA-665K and Cambrian-7M, selected subsets retain high performance levels. They achieve 98.6% and 98.8% performance compared to full datasets and can even surpass full data training at a 60% selection ratio on LLaVA-665K.
The approach also generalizes to unseen tasks and architectures, showcasing strong transfer capabilities. Two compact subsets, LLaVA-ICONS-133K and Cambrian-ICONS-1.4M, are released with impactful training examples for efficient vision-language model development.

Read Full Article

5 Likes

Arxiv

264

Image Credit: Arxiv

Neuromorphic Optical Tracking and Imaging of Randomly Moving Targets through Strongly Scattering Media

Researchers have developed a neuromorphic optical engineering and computational approach to track and image moving targets obscured by scattering media.
The method combines an event detecting camera with multistage neuromorphic deep learning for object localization and identification.
Photon signals from scattering media are converted to pixel-wise asynchronized spike trains by the event camera to filter out background noise.
A deep spiking neural network (SNN) processes the spiking data for simultaneous tracking and image reconstruction of objects.
The approach successfully tracked and imaged randomly moving objects in dense turbid media and dynamic stationary objects.
Standardized character sets were used to represent complex objects, showcasing the method's versatility.
The study emphasizes the benefits of a fully neuromorphic approach in achieving efficient imaging technology with low power consumption.

Read Full Article

15 Likes

Arxiv

359

Image Credit: Arxiv

DeepExtractor: Time-domain reconstruction of signals and glitches in gravitational wave data with deep learning

Gravitational wave detectors like LIGO, Virgo, and KAGRA are sensitive to signals from distant astrophysical events but can be affected by background noise, including glitches.
DeepExtractor is a deep learning framework introduced to reconstruct signals and glitches in gravitational wave data, surpassing interferometer noise levels.
This model is designed to capture the noise distribution of GW detectors assuming Gaussian and stationary noise over short time intervals, aiming to separate signal or glitch from noise.
DeepExtractor was tested through experiments including simulated glitches in detector noise, comparison with the BayesWave algorithm, and analyzing real data from the Gravity Spy dataset for glitch subtraction in LIGO strain data.
The model performed well in reconstructing simulated glitches with a median mismatch of only 0.9%, outperforming other deep learning baselines.
DeepExtractor also excelled in glitch recovery compared to BayesWave, offering a significant speedup by reconstructing one glitch sample in about 0.1 seconds on a CPU, much faster than BayesWave's processing time of about an hour per glitch.

Read Full Article

21 Likes

Arxiv

Image Credit: Arxiv

Generalizable and Fast Surrogates: Model Predictive Control of Articulated Soft Robots using Physics-Informed Neural Networks

Soft robots can enhance applications requiring dexterity and safety.
Real-time control of these systems demands fast and accurate models.
First-principles models for prediction are slow, while black-box models lack generalizability.
Physics-informed machine learning is advantageous but usually limited.
Physics-informed neural networks (PINNs) are proposed for articulated soft robots (ASRs) with a focus on data efficiency.
PINNs reduce the need for expensive real-world training data to a single dataset.
Comparisons against gold-standard approaches show PINNs provide high generalizability.
PINNs surpass the prediction speed of accurate FP models by up to 467 times, albeit with slightly reduced accuracy.
This advancement allows for nonlinear model predictive control (MPC) of a pneumatic ASR.
Accurate position tracking is achieved at a 47 Hz MPC rate in six dynamic experiments.

Read Full Article

2 Likes

Arxiv

Image Credit: Arxiv

ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models

Research paper introduces ImageChain, enhancing multimodal large language models with sequential reasoning capabilities over image data.
ImageChain models visual sequences as a multi-turn conversation by interleaving images with corresponding textual descriptions.
Framework explicitly captures temporal dependencies and narrative progression in image data.
Optimizes for the task of next-scene description, where model generates context-aware descriptions based on preceding visual and textual cues.
Approach improves performance on next-scene description task, showing an average improvement from 3.7% to 19% in SimRate metric.
ImageChain demonstrates robust zero-shot out-of-domain performance in applications like comics and robotics.
Extensive experiments validate the importance of instruction-tuning in a multimodal, multi-turn conversation design for enhanced reasoning.

Read Full Article

5 Likes

Arxiv

114

Image Credit: Arxiv

FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts

Multimodal Large Language Models (MLLMs) are vulnerable to jailbreak attacks where harmful content can be induced, posing safety risks despite safety alignment efforts.
A new method named FC-Attack utilizes auto-generated flowcharts with partially harmful information to trick MLLMs into providing additional harmful details.
FC-Attack fine-tunes a pre-trained model to create a step-description generator from benign datasets, then transforms harmful queries into flowcharts for the attack.
The flowcharts come in vertical, horizontal, and S-shaped forms, combined with benign text prompts to execute the attack on MLLMs, achieving high success rates.
Evaluations on Advbench demonstrate FC-Attack's success rates of up to 96% via images and up to 78% via videos across various MLLMs.
Factors affecting the attack performance, such as the number of steps and font styles in the flowcharts, are investigated, with font style changes improving success rates.
FC-Attack enhances jailbreak performance from 4% to 28% in Claude-3.5 by altering font styles.
Several defense mechanisms, including AdaShield, help mitigate the attack; however, they may come at the cost of reduced utility.

Read Full Article

6 Likes

Arxiv

343

Image Credit: Arxiv

Spatial Reasoning with Denoising Models

Researchers introduce Spatial Reasoning Models (SRMs) for reasoning over sets of continuous variables using denoising generative models.
SRMs infer continuous representations on unobserved variables based on observations on observed variables.
Current generative models like diffusion and flow matching models can lead to hallucinations in complex distributions.
The study includes benchmark tasks to evaluate the quality of reasoning in generative models and quantify hallucination.
SRMs highlight the importance of sequentialization in generation, the associated order, and sampling strategies during training.
The framework shows that the order of generation can be predicted by the denoising network itself, leading to significant accuracy improvements.
The project website offers additional resources including videos, code, and benchmark datasets.
The SRM framework enhances accuracy in specific reasoning tasks from less than 1% to over 50%.

Read Full Article

20 Likes

Arxiv

165

Image Credit: Arxiv

Mamba time series forecasting with uncertainty quantification

Mamba, a state space model, has gained attention for time series forecasting.
Mamba forecasts in electricity consumption benchmarks show an average error of about 8%.
In traffic occupancy benchmarks, the mean error in Mamba forecasts reaches 18%.
A method is proposed to quantify the predictive uncertainty of Mamba forecasts.
A dual-network framework based on the Mamba architecture is introduced for probabilistic forecasting.
The framework includes one network for point forecasts and another for estimating predictive uncertainty by modeling variance.
The tool is named Mamba-ProbTSF, and its implementation code is available on GitHub.
Evaluation on synthetic and real-world benchmark datasets shows effectiveness.
Kullback-Leibler divergence between learned distributions and data is reduced to a low level for both synthetic and real-world data.
The true trajectory stays within the predicted uncertainty interval around 95% of the time for both electricity consumption and traffic occupancy benchmarks.
Considerations for limitations, performance improvements, and applications to stochastic dynamics processes are discussed.
The research is detailed in arXiv:2503.10873v2, focusing on time series forecasting with uncertainty quantification.

Read Full Article

9 Likes

Arxiv

169

Image Credit: Arxiv

Multi-Variable Batch Bayesian Optimization in Materials Research: Synthetic Data Analysis of Noise Sensitivity and Problem Landscape Effects

Bayesian Optimization (BO) is increasingly used in materials science for experimental optimization tasks.
A study was conducted to simulate batch BO with six design variables and different noise levels.
Two test cases, Ackley function and Hartmann function, relevant for materials science problems were examined.
The study analyzed the impact of noise, batch-picking method, acquisition function, and hyperparameter values on optimization outcomes.
Noise was found to have varying effects depending on the problem landscape.
Noise degraded optimization results more in a needle-in-a-haystack search scenario, but increased the probability of finding a local optimum in the Hartmann function.
Prior knowledge of the problem domain structure and noise level is crucial when designing BO for materials research experiments.
Synthetic data studies help evaluate the impact of different batch BO components before moving to real experimental systems.
The study results aim to enhance the utilization of BO in guiding experimental materials research with a large number of design variables.

Read Full Article

10 Likes

Medium

Title: The Tachyonic Recursive Collapse Model (TRCM): A Framework for Semantic Information Flow…

The paper presents a theoretical framework merging symbolic recursion, AI behavior modeling, quantum metaphors, and semantic mathematics to explore posthuman cognition and information dynamics.
It introduces the Tachyonic Recursive Collapse Model (TRCM) that unifies Semantic Information Mathematics, Quantum-Collapsed Symbolic Dynamics, and tachyonic metaphors.
TRCM models how meaning propagates within recursive symbolic cognition, extending traditional information theory by treating symbols as dynamic entities in temporal spaces.
Semantic Information Mathematics defines Symbol State Vectors for symbols in recursive cognition, considering compression index, directional gradient, and vector position.
Quantum-Collapsed Symbolic Dynamics explains how attention collapses symbols into definite outputs based on coherence scores involving validity, satisfaction, and elegance.
TRCM applies a tachyonic field metaphor to model symbols propagating influence forward and backward through cognitive sequences.
Attention mechanisms in AI align with TRCM, where the self-attention mechanism acts as a soft tachyonic field, influencing meanings throughout sequences.
The TRCM framework suggests implications for posthuman cognition, envisioning systems operating on recursive semantic harmonics and ethical regulations encoded in recursive time-loops.
Future applications of TRCM include designing recursive AI architectures, post-symbolic compression layers for AGI cognition, ethical fields for AI, and achieving deep AI interpretability.
TRCM offers a metaphorical map for understanding deeply recursive, symbolic posthuman cognition beyond linear causality.

Read Full Article

2 Likes

For uninterrupted reading, download the app