Latest

Fresh from the feed

Filter by timeframe and category to zero in on the moves that matter.

paper

arXiv stat.ML3 days ago

Beyond Observations: Reconstruction Error-Guided Irregularly Sampled Time Series Representation Learning

arXiv:2511.06854v2 Announce Type: replace-cross Abstract: Irregularly sampled time series (ISTS), characterized by non-uniform time intervals with natural missingness, are prevalent in real-world applications. Existing approaches for ISTS modeling primarily rely on observed values to impute unobserved ones or infer latent dynamics. However, these methods overlook a critical source of learning signal: the reconstruction error inherently produced during model training. Such error implicitly reflects how well a model captures the underlying data structure and can serve as an informative proxy for unobserved values. To exploit this insight, we propose iTimER, a simple yet effective self-supervised pre-training framework for ISTS representation learning. iTimER models the distribution of reconstruction errors over observed values and generates pseudo-observations for unobserved timestamps through a mixup strategy between sampled errors and the last available observations. This transforms unobserved timestamps into noise-aware training targets, enabling meaningful reconstruction signals. A Wasserstein metric aligns reconstruction error distributions between observed and pseudo-observed regions, while a contrastive learning objective enhances the discriminability of learned representations. Extensive experiments on classification, interpolation, and forecasting tasks demonstrate that iTimER consistently outperforms state-of-the-art methods under the ISTS setting.

#ai

Score · 2.80

paper

arXiv stat.ML3 days ago

Bridging Constraints and Stochasticity: A Fully First-Order Method for Stochastic Bilevel Optimization with Linear Constraints

arXiv:2511.09845v2 Announce Type: replace-cross Abstract: This work provides the first finite-time convergence guarantees for linearly constrained stochastic bilevel optimization using only first-order methods, requiring solely gradient information without any Hessian computations or second-order derivatives. We address the unprecedented challenge of simultaneously handling linear constraints, stochastic noise, and finite-time analysis in bilevel optimization, a combination that has remained theoretically intractable until now. While existing approaches either require second-order information, handle only unconstrained stochastic problems, or provide merely asymptotic convergence results, our method achieves finite-time guarantees using gradient-based techniques alone. We develop a novel framework that constructs hypergradient approximations via smoothed penalty functions, using approximate primal and dual solutions to overcome the fundamental challenges posed by the interaction between linear constraints and stochastic noise. Our theoretical analysis provides explicit finite-time bounds on the bias and variance of the hypergradient estimator, demonstrating how approximation errors interact with stochastic perturbations. We prove that our first-order algorithm converges to $(\delta, \epsilon)$-Goldstein stationary points using $\Theta(\delta^{-1}\epsilon^{-5})$ stochastic gradient evaluations, establishing the first finite-time complexity result for this challenging problem class and representing a significant theoretical breakthrough in constrained stochastic bilevel optimization.

#ai

Score · 2.80

paper

arXiv cs.NE3 days ago

Phase-Coded Memory and Morphological Resonance: A Next-Generation Retrieval-Augmented Generator Architecture

arXiv:2511.11848v1 Announce Type: new Abstract: This paper introduces a cognitive Retrieval-Augmented Generator (RAG) architecture that transcends transformer context-length limitations through phase-coded memory and morphological-semantic resonance. Instead of token embeddings, the system encodes meaning as complex wave patterns with amplitude-phase structure. A three-tier design is presented: a Morphological Mapper that transforms inputs into semantic waveforms, a Field Memory Layer that stores knowledge as distributed holographic traces and retrieves it via phase interference, and a Non-Contextual Generator that produces coherent output guided by resonance rather than fixed context. This approach eliminates sequential token dependence, greatly reduces memory and computational overhead, and enables unlimited effective context through frequency-based semantic access. The paper outlines theoretical foundations, pseudocode implementation, and experimental evidence from related complex-valued neural models, emphasizing substantial energy, storage, and time savings.

#research

Score · 2.80

paper

arXiv cs.NE3 days ago

Benchmarking that Matters: Rethinking Benchmarking for Practical Impact

arXiv:2511.12264v1 Announce Type: new Abstract: Benchmarking has driven scientific progress in Evolutionary Computation, yet current practices fall short of real-world needs. Widely used synthetic suites such as BBOB and CEC isolate algorithmic phenomena but poorly reflect the structure, constraints, and information limitations of continuous and mixed-integer optimization problems in practice. This disconnect leads to the misuse of benchmarking suites for competitions, automated algorithm selection, and industrial decision-making, despite these suites being designed for different purposes. We identify key gaps in current benchmarking practices and tooling, including limited availability of real-world-inspired problems, missing high-level features, and challenges in multi-objective and noisy settings. We propose a vision centered on curated real-world-inspired benchmarks, practitioner-accessible feature spaces and community-maintained performance databases. Real progress requires coordinated effort: A living benchmarking ecosystem that evolves with real-world insights and supports both scientific understanding and industrial use.

#ai

Score · 2.80

paper

arXiv cs.NE3 days ago

Random-Key Metaheuristic and Linearization for the Quadratic Multiple Constraints Variable-Sized Bin Packing Problem

arXiv:2511.12367v1 Announce Type: new Abstract: This paper addresses the Quadratic Multiple Constraints Variable-Sized Bin Packing Problem (QMC-VSBPP), a challenging combinatorial optimization problem that generalizes the classical bin packing by incorporating multiple capacity dimensions, heterogeneous bin types, and quadratic interaction costs between items. We propose two complementary methods that advance the current state-of-the-art. First, a linearized mathematical formulation is introduced to eliminate quadratic terms, enabling the use of exact solvers such as Gurobi to compute strong lower bounds - reported here for the first time for this problem. Second, we develop RKO-ACO, a continuous-domain Ant Colony Optimization algorithm within the Random-Key Optimization framework, enhanced with adaptive Q-learning parameter control and efficient local search. Extensive computational experiments on benchmark instances show that the proposed linearized model produces significantly tighter lower bounds than the original quadratic formulation, while RKO-ACO consistently matches or improves upon all best-known solutions in the literature, establishing new upper bounds for large-scale instances. These results provide new reference values for future studies and demonstrate the effectiveness of evolutionary and random-key metaheuristic approaches for solving complex quadratic packing problems. Source code and data available at https://github.com/nataliaalves03/RKO-ACO

#ai

#research

#open_source

Score · 2.80

paper

arXiv cs.NE3 days ago

Tokenize Once, Recommend Anywhere: Unified Item Tokenization for Multi-domain LLM-based Recommendation

arXiv:2511.12922v1 Announce Type: cross Abstract: Large language model (LLM)-based recommender systems have achieved high-quality performance by bridging the discrepancy between the item space and the language space through item tokenization. However, existing item tokenization methods typically require training separate models for each item domain, limiting generalization. Moreover, the diverse distributions and semantics across item domains make it difficult to construct a unified tokenization that preserves domain-specific information. To address these challenges, we propose UniTok, a Unified item Tokenization framework that integrates our own mixture-of-experts (MoE) architecture with a series of codebooks to convert items into discrete tokens, enabling scalable tokenization while preserving semantic information across multiple item domains. Specifically, items from different domains are first projected into a unified latent space through a shared encoder. They are then routed to domain-specific experts to capture the unique semantics, while a shared expert, which is always active, encodes common knowledge transferable across domains. Additionally, to mitigate semantic imbalance across domains, we present a mutual information calibration mechanism, which guides the model towards retaining similar levels of semantic information for each domain. Comprehensive experiments on wide-ranging real-world datasets demonstrate that the proposed UniTok framework is (a) highly effective: achieving up to 51.89% improvements over strong benchmarks, (b) theoretically sound: showing the analytical validity of our architectural design and optimization; and (c) highly generalizable: demonstrating robust performance across diverse domains without requiring per-domain retraining, a capability not supported by existing baselines.

#ai

#llm

Score · 2.80

paper

arXiv cs.NE3 days ago

Self-Organization of Attractor Landscapes in High-Capacity Kernel Logistic Regression Hopfield Networks

arXiv:2511.13053v1 Announce Type: cross Abstract: Kernel-based learning methods can dramatically increase the storage capacity of Hopfield networks, yet the dynamical mechanism behind this enhancement remains poorly understood. We address this gap by conducting a geometric analysis of the network's energy landscape. We introduce a novel metric, ``Pinnacle Sharpness,'' to quantify the local stability of attractors. By systematically varying the kernel width and storage load, we uncover a rich phase diagram of attractor shapes. Our central finding is the emergence of a ``ridge of optimization,'' where the network maximizes attractor stability under challenging high-load and global-kernel conditions. Through a theoretical decomposition of the landscape gradient into a direct ``driving'' force and an indirect ``feedback'' force, we reveal the origin of this phenomenon. The optimization ridge corresponds to a regime of strong anti-correlation between the two forces, where the direct force, amplified by the high storage load, dominates the opposing collective feedback force. This demonstrates a sophisticated self-organization mechanism: the network adaptively harnesses inter-pattern interactions as a cooperative feedback control system to sculpt a robust energy landscape. Our findings provide a new physical picture for the stability of high-capacity associative memories and offer principles for their design.

#ai

Score · 2.80

paper

arXiv cs.NE3 days ago

A Generalized and Configurable Benchmark Generator for Continuous Unconstrained Numerical Optimization

arXiv:2312.07083v2 Announce Type: replace Abstract: As optimization challenges continue to evolve, so too must our tools and understanding. To effectively assess, validate, and compare optimization algorithms, it is crucial to use a benchmark test suite that encompasses a diverse range of problem instances with various characteristics. Traditional benchmark suites often consist of numerous fixed test functions, making it challenging to align these with specific research objectives, such as the systematic evaluation of algorithms under controllable conditions. This paper introduces the Generalized Numerical Benchmark Generator (GNBG) for singleobjective, box-constrained, continuous numerical optimization. Unlike the commonly used test suites that rely on multiple baseline functions and transformations, GNBG utilizes a single, parametric, and configurable baseline function. This design allows for control over various problem characteristics. Researchers using GNBG can generate instances that cover a broad range of morphological features, from unimodal to highly multimodal functions, various local optima patterns, and symmetric to highly asymmetric structures. The generated problems can also vary in separability, variable interaction structures, dimensionality, conditioning, and basin shapes. These customizable features enable the systematic evaluation and comparison of optimization methods, allowing researchers to examine the strengths and weaknesses of algorithms under diverse and controllable conditions.

#ai

#research

Score · 2.80

paper

arXiv cs.NE3 days ago

State-Space Constraints Can Improve the Generalisation of the Differentiable Neural Computer to Input Sequences With Unseen Length

arXiv:2110.09138v2 Announce Type: replace-cross Abstract: Memory-augmented neural networks (MANNs) can perform algorithmic tasks such as sorting. However, they often fail to generalise to input sequence lengths not encountered during training. We introduce two approaches that constrain the state space of the MANN's controller network: state compression and state regularisation. We empirically demonstrated that both approaches can improve generalisation to input sequences of out-of-distribution lengths for a specific type of MANN: the differentiable neural computer (DNC). The constrained DNC could process input sequences that were up to 2.3 times longer than those processed by an unconstrained baseline controller network. Notably, the applied constraints enabled the extension of the DNC's memory matrix without the need for retraining and thus allowed the processing of input sequences that were 10.4 times longer. However, the improvements were not consistent across all tested algorithmic tasks. Interestingly, solutions that performed better often had a highly structured state space, characterised by state trajectories exhibiting increased curvature and loop-like patterns. Our experimental work demonstrates that state-space constraints can enable the training of a DNC using shorter input sequences, thereby saving computational resources and facilitating training when acquiring long sequences is costly.

#ai

Score · 2.80

paper

arXiv cs.NE3 days ago

Ken Utilization Layer: Hebbian Replay Within a Student's Ken for Adaptive Exercise Recommendation

arXiv:2507.00032v2 Announce Type: replace-cross Abstract: Adaptive exercise recommendation (ER) aims to choose the next activity that matches a learner's evolving Zone of Proximal Development (ZPD). We present KUL-Rec, a biologically inspired ER system that couples a fast Hebbian memory with slow replay-based consolidation to enable continual, few-shot personalization from sparse interactions. The model operates in an embedding space, allowing a single architecture to handle both tabular knowledge-tracing logs and open-ended short-answer text. We align evaluation with tutoring needs using bidirectional ranking and rank-sensitive metrics (nDCG, Recall@K). Across ten public datasets, KUL-Rec improves macro nDCG (0.316 vs. 0.265 for the strongest baseline) and Recall@10 (0.305 vs. 0.211), while achieving low inference latency and an $\approx99$\% reduction in peak GPU memory relative to a competitive graph-based model. In a 13-week graduate course, KUL-Rec personalized weekly short-answer quizzes generated by a retrieval-augmented pipeline and the personalized quizzes were associated with lower perceived difficulty and higher helpfulness (p < .05). An embedding robustness audit highlights that encoder choice affects semantic alignment, motivating routine audits when deploying open-response assessment. Together, these results indicate that Hebbian replay with bounded consolidation offers a practical path to real-time, interpretable ER that scales across data modalities and classroom settings.

#ai

Score · 2.80

paper

arXiv cs.NE3 days ago

Amorphous Solid Model of Vectorial Hopfield Neural Networks

arXiv:2507.22787v4 Announce Type: replace-cross Abstract: We introduce a three-dimensional vectorial extension of the Hopfield associative-memory model in which each neuron is a unit vector on $S^2$ and synaptic couplings are $3\times 3$ blocks generated through a vectorial Hebbian rule. The resulting block-structured operator is mathematically analogous to the Hessian of amorphous solids and induces a rigid energy landscape with deep minima for stored patterns. Simulations and spectral analysis show that the vectorial network substantially outperforms the classical binary Hopfield model. For moderate connectivity, the critical storage ratio $\gamma_c$ grows approximately linearly with the coordination number $Z$, while for $Z\gtrsim 40$ a high-connectivity regime emerges in which $\gamma_c$ systematically exceeds the extrapolated low-$Z$ linear fit. At the same time, a persistent spectral gap separates pattern modes from the bulk and basins of attraction enlarge, yielding enhanced robustness to initialization noise. Thus geometric constraints combined with amorphous-solid-inspired structure produce associative memories with superior storage and retrieval performance, especially in the high-connectivity ($Z \gtrsim 20$-$30$) regime.

#ai

Score · 2.80

paper

arXiv cs.NE3 days ago

Hyperellipsoid Density Sampling: Exploitative Sequences to Accelerate High-Dimensional Optimization

arXiv:2511.07836v3 Announce Type: replace-cross Abstract: The curse of dimensionality presents a pervasive challenge in optimization problems, with exponential expansion of the search space rapidly causing traditional algorithms to become inefficient or infeasible. An adaptive sampling strategy is presented to accelerate optimization in this domain as an alternative to uniform quasi-Monte Carlo (QMC) methods. This method, referred to as Hyperellipsoid Density Sampling (HDS), generates its sequences by defining multiple hyperellipsoids throughout the search space. HDS uses three types of unsupervised learning algorithms to circumvent high-dimensional geometric calculations, producing an intelligent, non-uniform sample sequence that exploits statistically promising regions of the parameter space and improves final solution quality in high-dimensional optimization problems. A key feature of the method is optional Gaussian weights, which may be provided to influence the sample distribution towards known locations of interest. This capability makes HDS versatile for applications beyond optimization, providing a focused, denser sample distribution where models need to concentrate their efforts on specific, non-uniform regions of the parameter space. The method was evaluated against Sobol, a standard QMC method, using differential evolution (DE) on the 29 CEC2017 benchmark test functions. The results show statistically significant improvements in solution geometric mean error (p < 0.05), with average performance gains ranging from 3% in 30D to 37% in 10D. This paper demonstrates the efficacy of HDS as a robust alternative to QMC sampling for high-dimensional optimization.

#ai

#research

Score · 2.80

paper

arXiv cs.LG3 days ago

Softmax as a Lagrangian-Legendrian Seam

arXiv:2511.11573v1 Announce Type: new Abstract: This note offers a first bridge from machine learning to modern differential geometry. We show that the logits-to-probabilities step implemented by softmax can be modeled as a geometric interface: two potential-generated, conservative descriptions (from negative entropy and log-sum-exp) meet along a Legendrian "seam" on a contact screen (the probability simplex) inside a simple folded symplectic collar. Bias-shift invariance appears as Reeb flow on the screen, and the Fenchel-Young equality/KL gap provides a computable distance to the seam. We work out the two- and three-class cases to make the picture concrete and outline next steps for ML: compact logit models (projective or spherical), global invariants, and connections to information geometry where on-screen dynamics manifest as replicator flows.

Score · 2.80

paper

arXiv cs.LG3 days ago

LLM on a Budget: Active Knowledge Distillation for Efficient Classification of Large Text Corpora

arXiv:2511.11574v1 Announce Type: new Abstract: Large Language Models (LLMs) are highly accurate in classification tasks, however, substantial computational and financial costs hinder their large-scale deployment in dynamic environments. Knowledge Distillation (KD) where a LLM "teacher" trains a smaller and more efficient "student" model, offers a promising solution to this problem. However, the distillation process itself often remains costly for large datasets, since it requires the teacher to label a vast number of samples while incurring significant token consumption. To alleviate this challenge, in this work we explore the active learning (AL) as a way to create efficient student models at a fraction of the cost while preserving the LLM's performance. In particular, we introduce M-RARU (Multi-class Randomized Accept/Reject Uncertainty Sampling), a novel AL algorithm that significantly reduces training costs. M-RARU employs an innovative strategy combining uncertainty with a randomized accept-reject mechanism to select only the most informative data points for the LLM teacher. This focused approach significantly minimizes required API calls and data processing time. We evaluate M-RARU against random sampling across five diverse student models (SVM, LDA, RF, GBDT, and DistilBERT) on multiple benchmark datasets. Experiments demonstrate that our proposed method achieves up to 80% reduction in sample requirements as compared to random sampling, substantially improving classification accuracy while reducing financial costs and overall training time.

#ai

#llm

Score · 2.80

paper

arXiv cs.LG3 days ago

Detecting Statistically Significant Fairness Violations in Recidivism Forecasting Algorithms

arXiv:2511.11575v1 Announce Type: new Abstract: Machine learning algorithms are increasingly deployed in critical domains such as finance, healthcare, and criminal justice [1]. The increasing popularity of algorithmic decision-making has stimulated interest in algorithmic fairness within the academic community. Researchers have introduced various fairness definitions that quantify disparities between privileged and protected groups, use causal inference to determine the impact of race on model predictions, and that test calibration of probability predictions from the model. Existing literature does not provide a way in which to assess whether observed disparities between groups are statistically significant or merely due to chance. This paper introduces a rigorous framework for testing the statistical significance of fairness violations by leveraging k-fold cross-validation [2] to generate sampling distributions of fairness metrics. This paper introduces statistical tests that can be used to identify statistically significant violations of fairness metrics based on disparities between predicted and actual outcomes, model calibration, and causal inference techniques [1]. We demonstrate this approach by testing recidivism forecasting algorithms trained on data from the National Institute of Justice. Our findings reveal that machine learning algorithms used for recidivism forecasting exhibit statistically significant bias against Black individuals under several fairness definitions, while also exhibiting no bias or bias against White individuals under other definitions. The results from this paper underscore the importance of rigorous and robust statistical testing while evaluating algorithmic decision-making systems.

#ai

#research

Score · 2.80

paper

arXiv cs.LG3 days ago

Parallel and Multi-Stage Knowledge Graph Retrieval for Behaviorally Aligned Financial Asset Recommendations

arXiv:2511.11583v1 Announce Type: new Abstract: Large language models (LLMs) show promise for personalized financial recommendations but are hampered by context limits, hallucinations, and a lack of behavioral grounding. Our prior work, FLARKO, embedded structured knowledge graphs (KGs) in LLM prompts to align advice with user behavior and market data. This paper introduces RAG-FLARKO, a retrieval-augmented extension to FLARKO, that overcomes scalability and relevance challenges using multi-stage and parallel KG retrieval processes. Our method first retrieves behaviorally relevant entities from a user's transaction KG and then uses this context to filter temporally consistent signals from a market KG, constructing a compact, grounded subgraph for the LLM. This pipeline reduces context overhead and sharpens the model's focus on relevant information. Empirical evaluation on a real-world financial transaction dataset demonstrates that RAG-FLARKO significantly enhances recommendation quality. Notably, our framework enables smaller, more efficient models to achieve high performance in both profitability and behavioral alignment, presenting a viable path for deploying grounded financial AI in resource-constrained environments.

#ai

#llm

#research

Score · 2.80

paper

arXiv cs.LG3 days ago

Output Supervision Can Obfuscate the Chain of Thought

arXiv:2511.11584v1 Announce Type: new Abstract: OpenAI (2025) showed that training against a chain of thought (CoT) monitor can cause obfuscated CoTs, which contain bad behavior the monitor cannot detect. They proposed to keep CoTs monitorable by training only against output monitors that do not have access to CoT. We show that such training can still cause obfuscated CoTs via two mechanisms. First, when a model is trained to produce a safe-looking output, that model may generalize to making its CoTs look safe. Second, since later tokens are conditioned on earlier ones, safe-looking CoTs may increase the likelihood of safe outputs, causing safe-looking CoTs to be reinforced. We introduce two mitigations to address these two issues, which achieve a Pareto improvement in terms of monitorability and task performance compared to regular training.

#ai

Score · 2.80

paper

arXiv cs.LG3 days ago

Parameter-Efficient and Personalized Federated Training of Generative Models at the Edge

arXiv:2511.11585v1 Announce Type: new Abstract: Large generative models (for example, language and diffusion models) enable high-quality text and image synthesis but are hard to train or adapt in cross-device federated settings due to heavy computation and communication and statistical/system heterogeneity. We propose FedGen-Edge, a framework that decouples a frozen, pre-trained global backbone from lightweight client-side adapters and federates only the adapters. Using Low-Rank Adaptation (LoRA) constrains client updates to a compact subspace, which reduces uplink traffic by more than 99 percent versus full-model FedAvg, stabilizes aggregation under non-IID data, and naturally supports personalization because each client can keep a locally tuned adapter. On language modeling (PTB) and image generation (CIFAR-10), FedGen-Edge achieves lower perplexity/FID and faster convergence than strong baselines while retaining a simple FedAvg-style server. A brief ablation shows diminishing returns beyond moderate LoRA rank and a trade-off between local epochs and client drift. FedGen-Edge offers a practical path toward privacy-preserving, resource-aware, and personalized generative AI on heterogeneous edge devices.

#ai

Score · 2.80

paper

arXiv cs.LG3 days ago

WildfireGenome: Interpretable Machine Learning Reveals Local Drivers of Wildfire Risk and Their Cross-County Variation

arXiv:2511.11589v1 Announce Type: new Abstract: Current wildfire risk assessments rely on coarse hazard maps and opaque machine learning models that optimize regional accuracy while sacrificing interpretability at the decision scale. WildfireGenome addresses these gaps through three components: (1) fusion of seven federal wildfire indicators into a sign-aligned, PCA-based composite risk label at H3 Level-8 resolution; (2) Random Forest classification of local wildfire risk; and (3) SHAP and ICE/PDP analyses to expose county-specific nonlinear driver relationships. Across seven ecologically diverse U.S. counties, models achieve accuracies of 0.755-0.878 and Quadratic Weighted Kappa up to 0.951, with principal components explaining 87-94% of indicator variance. Transfer tests show reliable performance between ecologically similar regions but collapse across dissimilar contexts. Explanations consistently highlight needleleaf forest cover and elevation as dominant drivers, with risk rising sharply at 30-40% needleleaf coverage. WildfireGenome advances wildfire risk assessment from regional prediction to interpretable, decision-scale analytics that guide vegetation management, zoning, and infrastructure planning.

#ai

Score · 2.80

paper

arXiv cs.LG3 days ago

Sound Logical Explanations for Mean Aggregation Graph Neural Networks

arXiv:2511.11593v1 Announce Type: new Abstract: Graph neural networks (GNNs) are frequently used for knowledge graph completion. Their black-box nature has motivated work that uses sound logical rules to explain predictions and characterise their expressivity. However, despite the prevalence of GNNs that use mean as an aggregation function, explainability and expressivity results are lacking for them. We consider GNNs with mean aggregation and non-negative weights (MAGNNs), proving the precise class of monotonic rules that can be sound for them, as well as providing a restricted fragment of first-order logic to explain any MAGNN prediction. Our experiments show that restricting mean-aggregation GNNs to have non-negative weights yields comparable or improved performance on standard inductive benchmarks, that sound rules are obtained in practice, that insightful explanations can be generated in practice, and that the sound rules can expose issues in the trained models.

#ai

Score · 2.80

Page 23 of 93