Machine Learning (ML) Latest News and Trending articles from all top sources only on Techminis

A naukri.com initiative

New

Home

ML News

Arxiv

236

Image Credit: Arxiv

Low-Rank Augmented Implicit Neural Representation for Unsupervised High-Dimensional Quantitative MRI Reconstruction

The study focuses on unsupervised high-dimensional quantitative MRI reconstruction using a novel framework called LoREIN.
Quantitative MRI plays a crucial role in clinical diagnosis by providing tissue-specific parameters.
Current reconstruction methods struggle with highly undersampled data in multi-parametric qMRI.
LoREIN integrates low-rank and continuity priors through LRR and INR to enhance reconstruction accuracy.
The framework utilizes INR for spatial bases estimation and high-fidelity reconstruction of weighted images.
Predicted multi-contrast weighted images improve reconstruction accuracy of quantitative parameter maps.
LoREIN's approach includes zero-shot learning, which has potential in high-dimensional image reconstruction tasks.
The study contributes to the field of medical imaging by advancing complex spatiotemporal reconstruction techniques.

Read Full Article

14 Likes

Arxiv

213

Image Credit: Arxiv

Bias Analysis in Unconditional Image Generative Models

Generative AI models' widespread use has led to concerns about bias and discrimination.
The mechanisms of bias in unconditional image generation models are not fully understood.
Bias is defined as the difference between an attribute's probability in observed vs. ideal distributions.
Researchers trained unconditional image generative models and evaluated bias shifts.
Experiments showed minor shifts in attributes between training and generated distributions.
Attribute shifts were influenced by the attribute classifier used in the evaluation.
Classifier sensitivity was observed for attributes with values on a spectrum.
There is a need for improved labeling practices and scrutiny of evaluation frameworks.
Understanding the socially complex nature of attributes is crucial in bias evaluation.

Read Full Article

12 Likes

Arxiv

146

Image Credit: Arxiv

Robot-Gated Interactive Imitation Learning with Adaptive Intervention Mechanism

Interactive Imitation Learning (IIL) enables agents to learn behaviors with human interventions, but this can be demanding for supervisors.
Proposed Adaptive Intervention Mechanism (AIM) in robot-gated IIL to reduce cognitive load on supervisors.
AIM uses a proxy Q-function to determine when to request human demonstrations based on agent's alignment with human actions.
Proxy Q-function assigns high values for deviations and decreases as agent's performance improves, allowing real-time assessment.
Expert-in-the-loop experiments show AIM reduces expert monitoring in continuous and discrete control tasks.
AIM outperforms Thrifty-DAgger by 40% in terms of human take-over cost and learning efficiency.
AIM identifies safety-critical states for expert intervention, leading to better quality demonstrations and reduced expert interaction.
Code and demo video for AIM available at https://github.com/metadriverse/AIM.

Read Full Article

8 Likes

Arxiv

390

Image Credit: Arxiv

PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers and Pseudo Anomalies

Anomaly Detection (AD) and Anomaly Localization (AL) are critical in high-reliability fields like medical imaging and industrial monitoring.
Current AD and AL methods are vulnerable to adversarial attacks due to limited training data consisting mainly of normal, unlabeled samples.
PatchGuard is introduced as an adversarially robust AD and AL technique that incorporates pseudo anomalies and localization masks within a Vision Transformer (ViT) architecture to address these vulnerabilities.
The study explores the essential features of pseudo anomalies and provides theoretical insights into attention mechanisms required to enhance the adversarial robustness of AD and AL systems.
The approach leverages Foreground-Aware Pseudo-Anomalies to improve anomaly-aware methods and integrates them into a ViT-based framework.
Adversarial training is guided by a novel loss function aimed at enhancing model robustness, as supported by theoretical analysis.
Experimental results on established industrial and medical datasets show that PatchGuard surpasses previous methods in adversarial scenarios with significant performance gains of 53.2% in AD and 68.5% in AL, while maintaining competitive accuracy in non-adversarial settings.
The code repository for PatchGuard is available at https://github.com/rohban-lab/PatchGuard

Read Full Article

23 Likes

Discover more

Arxiv

173

Image Credit: Arxiv

Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Shojaee et al. (2025) found that Large Reasoning Models (LRMs) face 'accuracy collapse' on planning puzzles beyond certain complexity thresholds.
The study argues that the reported failures primarily stem from experimental design issues rather than inherent reasoning deficiencies.
Key issues identified include Tower of Hanoi experiments exceeding model output limits, leading to failure despite acknowledging these constraints.
The automated evaluation system fails to differentiate between reasoning failures and practical limitations, resulting in misjudgment of model abilities.
Authors note that River Crossing benchmarks feature mathematically unsolvable instances for N > 5 due to boat capacity constraints, yet models are marked as failures for not solving these problems.
When experimental artifacts are addressed by requesting generating functions instead of exhaustive move lists, preliminary tests suggest high accuracy on Tower of Hanoi instances previously deemed as complete failures.
The study underscores the significance of meticulous experimental design in the assessment of AI reasoning proficiency.

Read Full Article

10 Likes

Arxiv

292

Image Credit: Arxiv

UFM: A Simple Path towards Unified Dense Correspondence with Flow

Dense image correspondence is crucial for various applications like visual odometry, 3D reconstruction, object association, and re-identification.
Historically, dense correspondence has been addressed separately for wide-baseline scenarios and optical flow estimation.
A Unified Flow & Matching model (UFM) has been introduced in this paper, trained on unified data for co-visible pixels in source and target images.
UFM utilizes a simple transformer architecture to directly predict the (u,v) flow, making it easier to train and more accurate for large flows compared to previous methods.
UFM outperforms state-of-the-art flow methods (Unimatch) by 28% in terms of accuracy, while also being 62% less error-prone and 6.7x faster than dense wide-baseline matchers (RoMa).
This model demonstrates that unified training can surpass specialized approaches in both wide-baseline and optical flow domains, enabling faster and more accurate correspondence tasks.
The development of UFM opens up new possibilities for multi-modal, long-range, and real-time correspondence applications.

Read Full Article

17 Likes

Arxiv

201

Image Credit: Arxiv

TTrace: Lightweight Error Checking and Diagnosis for Distributed Training

Distributed training is crucial for scaling the training of large neural network models like LLMs.
Complexity of distributed training programs makes them prone to silent bugs.
Common debugging practices using metrics may be inefficient for detecting such bugs.
TTrace is designed to detect and localize silent bugs in distributed training effectively.
TTrace collects intermediate tensors and compares them against a single-device reference to detect bugs.
Novel mathematical analysis is proposed to compare floating-point values in tensors and set thresholds for bug detection.
Experimental results show TTrace detects 11 existing bugs and 3 new bugs in Megatron-LM with minimal code changes.
TTrace is effective in various training recipes, including low-precision scenarios with BF16 and FP8.

Read Full Article

12 Likes

Arxiv

193

Image Credit: Arxiv

ScalableHD: Scalable and High-Throughput Hyperdimensional Computing Inference on Multi-Core CPUs

Hyperdimensional Computing (HDC) is a computing paradigm using high-dimensional hypervectors.
Recent HDC methods focus on iterative training for improved accuracy, accelerated on GPUs.
Efficient HDC inference has mostly been on specialized hardware, not multi-core CPUs.
ScalableHD is proposed for high-throughput HDC inference on multi-core CPUs.
ScalableHD uses a two-stage pipelined execution model parallelized across cores.
Intermediate results are streamed between stages to enhance cache locality.
Features like memory tiling and NUMA-aware worker-to-core binding are integrated for performance.
ScalableHD has variants for small and large batch sizes to exploit compute parallelism.
It achieves up to 10x speedup over TorchHD, maintaining accuracy for tasks like image classification.
ScalableHD shows robust scalability with throughput improvements as cores increase.

Read Full Article

11 Likes

Arxiv

Image Credit: Arxiv

Lightweight Object Detection Using Quantized YOLOv4-Tiny for Emergency Response in Aerial Imagery

Researchers introduce a lightweight object detection solution using quantized YOLOv4-Tiny for emergency response in aerial imagery.
The solution targets energy efficiency and effectiveness during emergency situations.
YOLOv4-Tiny, optimized through post-training quantization to INT8 precision, is the model of choice.
A custom-curated aerial emergency dataset with 10,820 annotated images was used for training.
The dataset creation was necessary due to the absence of publicly available drone-view emergency imagery.
Comparative evaluation against YOLOv5-small was conducted, showcasing metric comparisons such as mAP, F1 score, inference time, and model size.
The quantized YOLOv4-Tiny demonstrated comparable detection performance, reduced model size from 22.5 MB to 6.4 MB, and boosted inference speed by 44%.
The model's attributes make it well-suited for real-time emergency detection on low-power edge devices.
The study contributes a new approach to lightweight object detection in emergency scenarios.
The methodology emphasizes efficiency without compromising on detection accuracy.
The custom dataset creation adds value given the unavailability of relevant public datasets.
Results highlight the efficacy of the quantized YOLOv4-Tiny model for emergency response applications.
The model's reduced size and improved inference speed enhance its suitability for real-world deployment.
The approach offers a promising solution for efficient aerial emergency imagery analysis.
The research findings emphasize the importance of energy-efficient object detection in emergency response contexts.

Read Full Article

2 Likes

Arxiv

Image Credit: Arxiv

What is the Cost of Differential Privacy for Deep Learning-Based Trajectory Generation?

Differential Privacy (DP) is used to protect sensitive personal information in location trajectories but balancing utility and privacy is difficult.
Deep learning-based generative models are used to create synthetic trajectories, lacking formal privacy guarantees and relying on conditional information.
A study evaluated the utility cost of enforcing DP in these models across two datasets and eleven utility metrics.
The evaluation looked at the impact of DP-SGD on generative models and proposed a novel DP mechanism for conditional generation with formal guarantees.
Diffusion, VAE, and GAN model types were analyzed for their effects on the utility-privacy trade-off.
Results indicated that DP-SGD significantly affects performance, with some utility remaining for large datasets.
The proposed DP mechanism enhances training stability, especially for GANs and smaller datasets.
Diffusion models show the best utility without guarantees, but GANs perform best with DP-SGD.
It suggests that the optimal non-private model may not be the best choice when considering formal guarantees.
DP trajectory generation remains challenging and formal guarantees are currently more feasible with large datasets and in specific use cases.

Read Full Article

1 Like

Arxiv

Image Credit: Arxiv

Surrogate models to optimize plasma assisted atomic layer deposition in high aspect ratio features

Researchers are investigating surrogate models to enhance plasma assisted atomic layer deposition (PEALD) in high aspect ratio features.
Plasma-based processes like PEALD can face challenges from surface recombination, requiring long exposure times for full conformality in high aspect ratio vias.
Artificial neural networks were trained on a synthetic dataset generated from PEALD simulations to predict saturation times based on cross section thickness data from partially coated conditions.
Results show that just two experiments in undersaturated conditions provide enough information to predict saturation times accurately within 10% of the actual time.
A surrogate model achieved 99% accuracy in determining whether surface recombination dominates plasma-surface interactions in PEALD processes.
Machine learning offers a faster route for optimizing PEALD processes in applications such as microelectronics.
The approach can also be extended to atomic layer etching and more complex structures.

Read Full Article

Arxiv

Image Credit: Arxiv

Alzheimer's Dementia Detection Using Perplexity from Paired Large Language Models

Alzheimer's dementia (AD) impacts language ability and is a neurodegenerative disorder with cognitive decline.
This study focuses on using a large language model (LLM), Mistral-7B, for AD detection through paired perplexity method.
The approach presented in this work improves detection accuracy by 3.33% compared to the best current method and by 6.35% over the top-ranked method from the ADReSS 2020 challenge benchmark.
The proposed approach provides a clear and interpretable decision boundary for AD detection, unlike other methods with opaque decision-making processes.
Analysis shows that the LLMs utilized have learned the unique language patterns of AD speakers, enhancing model interpretation and data augmentation possibilities.

Read Full Article

Arxiv

Image Credit: Arxiv

Ming-Omni: A Unified Multimodal Model for Perception and Generation

Ming-Omni is a unified multimodal model capable of processing images, text, audio, and video efficiently.
It demonstrates proficiency in speech and image generation using dedicated encoders and an MoE architecture named Ling.
The model uses modality-specific routers to process tokens from different modalities within a unified framework.
Ming-Omni can handle diverse tasks without needing separate models, task-specific fine-tuning, or structural redesign.
It supports audio and image generation, featuring an advanced audio decoder for natural speech generation and Ming-Lite-Uni for high-quality image generation.
The model can engage in tasks like context-aware chatting, text-to-speech conversion, and versatile image editing.
Experimental results demonstrate that Ming-Omni offers a powerful solution for unified perception and generation across all modalities.
Ming-Omni is the first open-source model known to match GPT-4o in modality support.
All code and model weights of Ming-Omni have been released to encourage further research and development in the community.

Read Full Article

2 Likes

Arxiv

327

Image Credit: Arxiv

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

Researchers propose autoregressive adversarial post-training (AAPT) to enable real-time interactive video generation.
Existing large-scale video generation models are computationally intensive, hindering real-time and interactive usage.
AAPT transforms a pre-trained latent video diffusion model into a real-time, interactive video generator.
The model generates a latent frame at a time using a single neural function evaluation, enabling real-time streaming and interactive control.
This approach leverages adversarial training for autoregressive generation, enhancing efficiency and error reduction.
The 8B model from the study achieved 24fps, real-time video generation at 736x416 resolution on a single H100 GPU.
On 8xH100 GPUs, the model could generate 1280x720 resolution videos up to a minute long (1440 frames) in real-time.
AAPT's design utilizes the KV cache efficiently and employs student-forcing during training to reduce error accumulation over long video sequences.

Read Full Article

19 Likes

Arxiv

181

Image Credit: Arxiv

SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending

Humanoid robots are valuable for daily tasks due to their flexibility and human-like features.
Previous methods for whole-body control and loco-manipulation in humanoids require task-specific tuning.
SkillBlender is a new hierarchical reinforcement learning framework for versatile humanoid loco-manipulation.
SkillBlender pretrains task-agnostic primitive skills and blends them dynamically for complex tasks.
SkillBench is introduced as a benchmark with diverse embodiments, skills, and tasks for evaluation.
Extensive simulated experiments show SkillBlender outperforms baselines in loco-manipulation tasks.
SkillBlender also prevents reward hacking and produces accurate and feasible movements.
The project code and benchmark will be open-sourced to support future research.

Read Full Article

10 Likes

For uninterrupted reading, download the app