Latest

Fresh from the feed

Filter by timeframe and category to zero in on the moves that matter.

paper

arXiv cs.LG3 days ago

Harli: Harvest Underutilized Resources in LLM Serving with Finetuning Tasks

arXiv:2511.11729v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly deployed under the Model-as-a-Service (MaaS) paradigm. To meet stringent quality-of-service (QoS) requirements, existing LLM serving systems disaggregate the prefill and decode phases of inference. However, decode instances often experience low GPU utilization due to their memory-bound nature and insufficient batching in dynamic workloads, leaving compute resources underutilized. We introduce Harli, a serving system that improves GPU utilization by co-locating parameter-efficient finetuning (PEFT) tasks with LLM decode instances. PEFT tasks are compute-bound and memory-efficient, making them ideal candidates for safe co-location. Specifically, Harli addresses key challenges--limited memory and unpredictable interference--using three components: a unified memory allocator for runtime memory reuse, a two-stage latency predictor for decode latency modeling, and a QoS-guaranteed throughput-maximizing scheduler for throughput maximization. Experimental results show that Harli improves the finetune throughput by 46.2% on average (up to 92.0%) over state-of-the-art serving systems, while maintaining strict QoS guarantees for inference decode.

#ai

#llm

Score · 2.80

paper

arXiv cs.LG3 days ago

The Environmental Impact of Ensemble Techniques in Recommender Systems

arXiv:2511.11649v1 Announce Type: cross Abstract: Ensemble techniques in recommender systems have demonstrated accuracy improvements of 10-30%, yet their environmental impact remains unmeasured. While deep learning recommendation algorithms can generate up to 3,297 kg CO2 per paper, ensemble methods have not been sufficiently evaluated for energy consumption. This thesis investigates how ensemble techniques influence environmental impact compared to single optimized models. We conducted 93 experiments across two frameworks (Surprise for rating prediction, LensKit for ranking) on four datasets spanning 100,000 to 7.8 million interactions. We evaluated four ensemble strategies (Average, Weighted, Stacking/Rank Fusion, Top Performers) against simple baselines and optimized single models, measuring energy consumption with a smart plug. Results revealed a non-linear accuracy-energy relationship. Ensemble methods achieved 0.3-5.7% accuracy improvements while consuming 19-2,549% more energy depending on dataset size and strategy. The Top Performers ensemble showed best efficiency: 0.96% RMSE improvement with 18.8% energy overhead on MovieLens-1M, and 5.7% NDCG improvement with 103% overhead on MovieLens-100K. Exhaustive averaging strategies consumed 88-270% more energy for comparable gains. On the largest dataset (Anime, 7.8M interactions), the Surprise ensemble consumed 2,005% more energy (0.21 Wh vs. 0.01 Wh) for 1.2% accuracy improvement, producing 53.8 mg CO2 versus 2.6 mg CO2 for the single model. This research provides one of the first systematic measurements of energy and carbon footprint for ensemble recommender systems, demonstrates that selective strategies offer superior efficiency over exhaustive averaging, and identifies scalability limitations at industrial scale. These findings enable informed decisions about sustainable algorithm selection in recommender systems.

#ai

#research

Score · 2.80

paper

arXiv cs.LG3 days ago

Exploring Parallelism in FPGA-Based Accelerators for Machine Learning Applications

arXiv:2511.11640v1 Announce Type: cross Abstract: Speculative backpropagation has emerged as a promising technique to accelerate the training of neural networks by overlapping the forward and backward passes. Leveraging speculative weight updates when error gradients fall within a specific threshold reduces training time without substantially compromising accuracy. In this work, we implement speculative backpropagation on the MNIST dataset using OpenMP as the parallel programming platform. OpenMP's multi-threading capabilities enable simultaneous execution of forward and speculative backpropagation steps, significantly improving training speed. The application is planned for synthesis on a state-of-the-art FPGA to demonstrate its potential for hardware acceleration. Our CPU-based experimental results demonstrate that speculative backpropagation achieves a maximum speedup of 24% in execution time when using a threshold of 0.25, and accuracy remaining within 3-4% of the baseline across various epochs. Additionally, when comparing individual step execution time, speculative backpropagation yields a maximum speedup of 35% over the baseline, demonstrating the effectiveness of overlapping forward and backward passes.

#ai

Score · 2.80

paper

arXiv cs.LG3 days ago

Omics-scale polymer computational database transferable to real-world artificial intelligence applications

arXiv:2511.11626v1 Announce Type: cross Abstract: Developing large-scale foundational datasets is a critical milestone in advancing artificial intelligence (AI)-driven scientific innovation. However, unlike AI-mature fields such as natural language processing, materials science, particularly polymer research, has significantly lagged in developing extensive open datasets. This lag is primarily due to the high costs of polymer synthesis and property measurements, along with the vastness and complexity of the chemical space. This study presents PolyOmics, an omics-scale computational database generated through fully automated molecular dynamics simulation pipelines that provide diverse physical properties for over $10^5$ polymeric materials. The PolyOmics database is collaboratively developed by approximately 260 researchers from 48 institutions to bridge the gap between academia and industry. Machine learning models pretrained on PolyOmics can be efficiently fine-tuned for a wide range of real-world downstream tasks, even when only limited experimental data are available. Notably, the generalisation capability of these simulation-to-real transfer models improve significantly as the size of the PolyOmics database increases, exhibiting power-law scaling. The emergence of scaling laws supports the "more is better" principle, highlighting the significance of ultralarge-scale computational materials data for improving real-world prediction performance. This unprecedented omics-scale database reveals vast unexplored regions of polymer materials, providing a foundation for AI-driven polymer science.

#ai

#research

Score · 2.80

paper

arXiv cs.LG3 days ago

Hierarchical Federated Graph Attention Networks for Scalable and Resilient UAV Collision Avoidance

arXiv:2511.11616v1 Announce Type: cross Abstract: The real-time performance, adversarial resiliency, and privacy preservation are the most important metrics that need to be balanced to practice collision avoidance in large-scale multi-UAV (Unmanned Aerial Vehicle) systems. Current frameworks tend to prescribe monolithic solutions that are not only prohibitively computationally complex with a scaling cost of $O(n^2)$ but simply do not offer Byzantine fault tolerance. The proposed hierarchical framework presented in this paper tries to eliminate such trade-offs by stratifying a three-layered architecture. We spread the intelligence into three layers: an immediate collision avoiding local layer running on dense graph attention with latency of $<10 ms$, a regional layer using sparse attention with $O(nk)$ computational complexity and asynchronous federated learning with coordinate-wise trimmed mean aggregation, and lastly, a global layer using a lightweight Hashgraph-inspired protocol. We have proposed an adaptive differential privacy mechanism, wherein the noise level $(\epsilon \in [0.1, 1.0])$ is dynamically reduced based on an evaluation of the measured real-time threat that in turn maximized the privacy-utility tradeoff. Through the use of Distributed Hash Table (DHT)-based lightweight audit logging instead of heavyweight blockchain consensus, the median cost of getting a $95^{th}$ percentile decision within 50ms is observed across all tested swarm sizes. This architecture provides a scalable scenario of 500 UAVs with a collision rate of $< 2.0\%$ and the Byzantine fault tolerance of $f < n/3$.

#ai

#research

Score · 2.80

paper

arXiv cs.LG3 days ago

Lightweight Hopfield Neural Networks for Bioacoustic Detection and Call Monitoring of Captive Primates

arXiv:2511.11615v1 Announce Type: cross Abstract: Passive acoustic monitoring is a sustainable method of monitoring wildlife and environments that leads to the generation of large datasets and, currently, a processing backlog. Academic research into automating this process is focused on the application of resource intensive convolutional neural networks which require large pre-labelled datasets for training and lack flexibility in application. We present a viable alternative relevant in both wild and captive settings; a transparent, lightweight and fast-to-train associative memory AI model with Hopfield neural network (HNN) architecture. Adapted from a model developed to detect bat echolocation calls, this model monitors captive endangered black-and-white ruffed lemur Varecia variegata vocalisations. Lemur social calls of interest when monitoring welfare are stored in the HNN in order to detect other call instances across the larger acoustic dataset. We make significant model improvements by storing an additional signal caused by movement and achieve an overall accuracy of 0.94. The model can perform $340$ classifications per second, processing over 5.5 hours of audio data per minute, on a standard laptop running other applications. It has broad applicability and trains in milliseconds. Our lightweight solution reduces data-to-insight turnaround times and can accelerate decision making in both captive and wild settings.

#ai

#research

Score · 2.80

paper

arXiv cs.LG3 days ago

Physics-Informed Neural Network-based Reliability Analysis of Buried Pipelines

arXiv:2511.11613v1 Announce Type: cross Abstract: Buried pipelines transporting oil and gas across geohazard-prone regions are exposed to potential ground movement, leading to the risk of significant strain demand and structural failure. Reliability analysis, which determines the probability of failure after accounting for pertinent uncertainties, is essential for ensuring the safety of pipeline systems. However, traditional reliability analysis methods involving computationally intensive numerical models, such as finite element simulations of pipeline subjected to ground movement, have limited applications; this is partly because stochastic sampling approaches require repeated simulations over a large number of samples for the uncertain variables when estimating low probabilities. This study introduces Physics-Informed Neural Network for Reliability Analysis (PINN-RA) for buried pipelines subjected to ground movement, which integrates PINN-based surrogate model with Monte Carlo Simulation (MCS) to achieve efficient reliability assessment. To enable its application under uncertain variables associated with soil properties and ground movement, the PINN-based surrogate model is extended to solve a parametric differential equation system, namely the governing equation of pipelines embedded in soil with different properties. The findings demonstrate that PINN-RA significantly reduces the computational effort required and thus accelerates reliability analysis. By eliminating the need for repetitive numerical evaluations of pipeline subjected to permanent ground movement, the proposed approach provides an efficient and scalable tool for pipeline reliability assessment, enabling rapid decision-making in geohazard-prone regions.

#ai

#research

Score · 2.80

paper

arXiv cs.LG3 days ago

GroupRank: A Groupwise Reranking Paradigm Driven by Reinforcement Learning

arXiv:2511.11653v1 Announce Type: cross Abstract: Large Language Models have shown strong potential as rerankers to enhance the overall performance of RAG systems. However, existing reranking paradigms are constrained by a core theoretical and practical dilemma: Pointwise methods, while simple and highly flexible, evaluate documents independently, making them prone to the Ranking Myopia Trap, overlooking the relative importance between documents. In contrast, Listwise methods can perceive the global ranking context, but suffer from inherent List Rigidity, leading to severe scalability and flexibility issues when handling large candidate sets. To address these challenges, we propose Groupwise, a novel reranking paradigm. In this approach, the query and a group of candidate documents are jointly fed into the model, which performs within-group comparisons to assign individual relevance scores to each document. This design retains the flexibility of Pointwise methods while enabling the comparative capability of Listwise methods. We further adopt GRPO for model training, equipped with a heterogeneous reward function that integrates ranking metrics with a distributional reward aimed at aligning score distributions across groups. To overcome the bottleneck caused by the scarcity of high quality labeled data, we further propose an innovative pipeline for synthesizing high quality retrieval and ranking data. The resulting data can be leveraged not only for training the reranker but also for training the retriever. Extensive experiments validate the effectiveness of our approach. On two reasoning intensive retrieval benchmarks, BRIGHT and R2MED.

#ai

#llm

Score · 2.80

paper

arXiv cs.LG3 days ago

Mind the Gap: Revealing Inconsistencies Across Heterogeneous AI Accelerators

arXiv:2511.11601v1 Announce Type: cross Abstract: While NVIDIA remains the dominant provider of AI accelerators within cloud data center, emerging vendors such as AMD, Intel, Mac, and Huawei offer cost-effective alternatives with claims of compatibility and performance. This paper presents the first empirical study investigating divergence in machine learning model across heterogeneous AI accelerators. Utilizing an automated pipeline, we synthesize over 100,000 variant models derived from 4,000 real-world models and execute them across five different enterprise-grade accelerators. Our findings suggest that newer AI platforms from Mac and Huawei support at least 17\% fewer operators than NVIDIA. These platforms also exhibit a higher rate of output discrepancies (exceeding 5\%), which stem from differences in operator implementations, handling of exceptional numerical values, and instruction scheduling. They are also more susceptible to failures during model compilation-based acceleration, and in some cases, the compiled models produce outputs that differ noticeably from those generated using the standard execution mode. In addition, we identify 7 implementation flaws in PyTorch and 40 platform-specific issues across vendors. These results underscore the challenges of achieving consistent machine learning behavior in an increasingly diverse hardware ecosystem.

#ai

#research

Score · 2.80

paper

arXiv cs.LG3 days ago

Limitations of Quantum Advantage in Unsupervised Machine Learning

arXiv:2511.10709v1 Announce Type: cross Abstract: Machine learning models are used for pattern recognition analysis of big data, without direct human intervention. The task of unsupervised learning is to find the probability distribution that would best describe the available data, and then use it to make predictions for observables of interest. Classical models generally fit the data to Boltzmann distribution of Hamiltonians with a large number of tunable parameters. Quantum extensions of these models replace classical probability distributions with quantum density matrices. An advantage can be obtained only when features of density matrices that are absent in classical probability distributions are exploited. Such situations depend on the input data as well as the targeted observables. Explicit examples are discussed that bring out the constraints limiting possible quantum advantage. The problem-dependent extent of quantum advantage has implications for both data analysis and sensing applications.

#ai

Score · 2.80

paper

arXiv cs.LG3 days ago

DCRM: A Heuristic to Measure Response Pair Quality in Preference Optimization

arXiv:2506.14157v1 Announce Type: cross Abstract: Recent research has attempted to associate preference optimization (PO) performance with the underlying preference datasets. In this work, our observation is that the differences between the preferred response $y^+$ and dispreferred response $y^-$ influence what LLMs can learn, which may not match the desirable differences to learn. Therefore, we use distance and reward margin to quantify these differences, and combine them to get Distance Calibrated Reward Margin (DCRM), a metric that measures the quality of a response pair for PO. Intuitively, DCRM encourages minimal noisy differences and maximal desired differences. With this, we study 3 types of commonly used preference datasets, classified along two axes: the source of the responses and the preference labeling function. We establish a general correlation between higher DCRM of the training set and better learning outcome. Inspired by this, we propose a best-of-$N^2$ pairing method that selects response pairs with the highest DCRM. Empirically, in various settings, our method produces training datasets that can further improve models' performance on AlpacaEval, MT-Bench, and Arena-Hard over the existing training sets.

#ai

#llm

#research

Score · 2.80

paper

arXiv cs.LG3 days ago

From Black Box to Insight: Explainable AI for Extreme Event Preparedness

arXiv:2511.13712v1 Announce Type: new Abstract: As climate change accelerates the frequency and severity of extreme events such as wildfires, the need for accurate, explainable, and actionable forecasting becomes increasingly urgent. While artificial intelligence (AI) models have shown promise in predicting such events, their adoption in real-world decision-making remains limited due to their black-box nature, which limits trust, explainability, and operational readiness. This paper investigates the role of explainable AI (XAI) in bridging the gap between predictive accuracy and actionable insight for extreme event forecasting. Using wildfire prediction as a case study, we evaluate various AI models and employ SHapley Additive exPlanations (SHAP) to uncover key features, decision pathways, and potential biases in model behavior. Our analysis demonstrates how XAI not only clarifies model reasoning but also supports critical decision-making by domain experts and response teams. In addition, we provide supporting visualizations that enhance the interpretability of XAI outputs by contextualizing feature importance and temporal patterns in seasonality and geospatial characteristics. This approach enhances the usability of AI explanations for practitioners and policymakers. Our findings highlight the need for AI systems that are not only accurate but also interpretable, accessible, and trustworthy, essential for effective use in disaster preparedness, risk mitigation, and climate resilience planning.

#ai

#research

Score · 2.80

paper

arXiv cs.LG3 days ago

Quantifying Skill and Chance: A Unified Framework for the Geometry of Games

arXiv:2511.11611v1 Announce Type: cross Abstract: We introduce a quantitative framework for separating skill and chance in games by modeling them as complementary sources of control over stochastic decision trees. We define the Skill-Luck Index S(G) in [-1, 1] by decomposing game outcomes into skill leverage K and luck leverage L. Applying this to 30 games reveals a continuum from pure chance (coin toss, S = -1) through mixed domains such as backgammon (S = 0, Sigma = 1.20) to pure skill (chess, S = +1, Sigma = 0). Poker exhibits moderate skill dominance (S = 0.33) with K = 0.40 +/- 0.03 and Sigma = 0.80. We further introduce volatility Sigma to quantify outcome uncertainty over successive turns. The framework extends to general stochastic decision systems, enabling principled comparisons of player influence, game balance, and predictive stability, with applications to game design, AI evaluation, and risk assessment.

#ai

Score · 2.80

paper

arXiv cs.LG3 days ago

Rare Genomic Subtype Discovery from RNA-seq via Autoencoder Embeddings and Stability-Aware Clustering

arXiv:2511.13705v1 Announce Type: new Abstract: Unsupervised learning on high-dimensional RNA-seq data can reveal molecular subtypes beyond standard labels. We combine an autoencoder-based representation with clustering and stability analysis to search for rare but reproducible genomic subtypes. On the UCI "Gene Expression Cancer RNA-Seq" dataset (801 samples, 20,531 genes; BRCA, COAD, KIRC, LUAD, PRAD), a pan-cancer analysis shows clusters aligning almost perfectly with tissue of origin (Cramer's V = 0.887), serving as a negative control. We therefore reframe the problem within KIRC (n = 146): we select the top 2,000 highly variable genes, standardize them, train a feed-forward autoencoder (128-dimensional latent space), and run k-means for k = 2-10. While global indices favor small k, scanning k with a pre-specified discovery rule (rare = 0.60 across 20 seeds after Hungarian alignment) yields a simple solution at k = 5 (silhouette = 0.129, DBI = 2.045) with a rare cluster C0 (6.85 percent of patients) that is highly stable (Jaccard = 0.787). Cluster-vs-rest differential expression (Welch's t-test, Benjamini-Hochberg FDR) identifies coherent markers. Overall, pan-cancer clustering is dominated by tissue of origin, whereas a stability-aware within-cancer approach reveals a rare, reproducible KIRC subtype.

#ai

Score · 2.80

paper

arXiv cs.LG3 days ago

ST-ProC: A Graph-Prototypical Framework for Robust Semi-Supervised Travel Mode Identification

arXiv:2511.13702v1 Announce Type: new Abstract: Travel mode identification (TMI) from GPS trajectories is critical for urban intelligence, but is hampered by the high cost of annotation, leading to severe label scarcity. Prevailing semi-supervised learning (SSL) methods are ill-suited for this task, as they suffer from catastrophic confirmation bias and ignore the intrinsic data manifold. We propose ST-ProC, a novel graph-prototypical multi-objective SSL framework to address these limitations. Our framework synergizes a graph-prototypical core with foundational SSL Support. The core exploits the data manifold via graph regularization, prototypical anchoring, and a novel, margin-aware pseudo-labeling strategy to actively reject noise. This core is supported and stabilized by foundational contrastive and teacher-student consistency losses, ensuring high-quality representations and robust optimization. ST-ProC outperforms all baselines by a significant margin, demonstrating its efficacy in real-world sparse-label settings, with a performance boost of 21.5% over state-of-the-art methods like FixMatch.

#ai

Score · 2.80

paper

arXiv cs.LG3 days ago

Learning stochasticity: a nonparametric framework for intrinsic noise estimation

arXiv:2511.13701v1 Announce Type: new Abstract: Understanding the principles that govern dynamical systems is a central challenge across many scientific domains, including biology and ecology. Incomplete knowledge of nonlinear interactions and stochastic effects often renders bottom-up modeling approaches ineffective, motivating the development of methods that can discover governing equations directly from data. In such contexts, parametric models often struggle without strong prior knowledge, especially when estimating intrinsic noise. Nonetheless, incorporating stochastic effects is often essential for understanding the dynamic behavior of complex systems such as gene regulatory networks and signaling pathways. To address these challenges, we introduce Trine (Three-phase Regression for INtrinsic noisE), a nonparametric, kernel-based framework that infers state-dependent intrinsic noise from time-series data. Trine features a three-stage algorithm that com- bines analytically solvable subproblems with a structured kernel architecture that captures both abrupt noise-driven fluctuations and smooth, state-dependent changes in variance. We validate Trine on biological and ecological systems, demonstrating its ability to uncover hidden dynamics without relying on predefined parametric assumptions. Across several benchmark problems, Trine achieves performance comparable to that of an oracle. Biologically, this oracle can be viewed as an idealized observer capable of directly tracking the random fluctuations in molecular concentrations or reaction events within a cell. The Trine framework thus opens new avenues for understanding how intrinsic noise affects the behavior of complex systems.

#ai

Score · 2.80

paper

arXiv cs.LG3 days ago

Protein Secondary Structure Prediction Using 3D Graphs and Relation-Aware Message Passing Transformers

arXiv:2511.13685v1 Announce Type: new Abstract: In this study, we tackle the challenging task of predicting secondary structures from protein primary sequences, a pivotal initial stride towards predicting tertiary structures, while yielding crucial insights into protein activity, relationships, and functions. Existing methods often utilize extensive sets of unlabeled amino acid sequences. However, these approaches neither explicitly capture nor harness the accessible protein 3D structural data, which is recognized as a decisive factor in dictating protein functions. To address this, we utilize protein residue graphs and introduce various forms of sequential or structural connections to capture enhanced spatial information. We adeptly combine Graph Neural Networks (GNNs) and Language Models (LMs), specifically utilizing a pre-trained transformer-based protein language model to encode amino acid sequences and employing message-passing mechanisms like GCN and R-GCN to capture geometric characteristics of protein structures. Employing convolution within a specific node's nearby region, including relations, we stack multiple convolutional layers to efficiently learn combined insights from the protein's spatial graph, revealing intricate interconnections and dependencies in its structural arrangement. To assess our model's performance, we employed the training dataset provided by NetSurfP-2.0, which outlines secondary structure in 3-and 8-states. Extensive experiments show that our proposed model, SSRGNet surpasses the baseline on f1-scores.

#ai

#research

Score · 2.80

paper

arXiv cs.LG3 days ago

Cross-Learning from Scarce Data via Multi-Task Constrained Optimization

arXiv:2511.13680v1 Announce Type: new Abstract: A learning task, understood as the problem of fitting a parametric model from supervised data, fundamentally requires the dataset to be large enough to be representative of the underlying distribution of the source. When data is limited, the learned models fail generalize to cases not seen during training. This paper introduces a multi-task \emph{cross-learning} framework to overcome data scarcity by jointly estimating \emph{deterministic} parameters across multiple, related tasks. We formulate this joint estimation as a constrained optimization problem, where the constraints dictate the resulting similarity between the parameters of the different models, allowing the estimated parameters to differ across tasks while still combining information from multiple data sources. This framework enables knowledge transfer from tasks with abundant data to those with scarce data, leading to more accurate and reliable parameter estimates, providing a solution for scenarios where parameter inference from limited data is critical. We provide theoretical guarantees in a controlled framework with Gaussian data, and show the efficiency of our cross-learning method in applications with real data including image classification and propagation of infectious diseases.

#ai

#research

Score · 2.80

paper

arXiv cs.LG3 days ago

Weight-sparse transformers have interpretable circuits

arXiv:2511.13653v1 Announce Type: new Abstract: Finding human-understandable circuits in language models is a central goal of the field of mechanistic interpretability. We train models to have more understandable circuits by constraining most of their weights to be zeros, so that each neuron only has a few connections. To recover fine-grained circuits underlying each of several hand-crafted tasks, we prune the models to isolate the part responsible for the task. These circuits often contain neurons and residual channels that correspond to natural concepts, with a small number of straightforwardly interpretable connections between them. We study how these models scale and find that making weights sparser trades off capability for interpretability, and scaling model size improves the capability-interpretability frontier. However, scaling sparse models beyond tens of millions of nonzero parameters while preserving interpretability remains a challenge. In addition to training weight-sparse models de novo, we show preliminary results suggesting our method can also be adapted to explain existing dense models. Our work produces circuits that achieve an unprecedented level of human understandability and validates them with considerable rigor.

#ai

#research

Score · 2.80

paper

arXiv cs.LG3 days ago

FuseSampleAgg: Fused Neighbor Sampling and Aggregation for Mini-batch GNNs

arXiv:2511.13645v1 Announce Type: new Abstract: We present FuseSampleAgg, a CUDA operator that fuses neighbor sampling and mean aggregation into a single pass for one and two hop GraphSAGE. By eliminating block materialization and extra kernel launches, FuseSampleAgg reduces memory traffic and overhead while preserving GraphSAGE mean semantics via saved index replay. Across the Reddit, ogbn-arxiv, and ogbn-products benchmarks (batch size 1024, automatic mixed precision enabled), we observe step time speedups up to 51x on ogbn-products, about 4x on Reddit with fanouts 10-10 and 15-10, and about 3.3x on ogbn-arxiv at larger fanouts, with peak GPU memory reductions up to 100x, 36x, and about 3.5x, respectively. The operator is deterministic, integrates with standard PyTorch optimizers, and ships with scripts that reproduce all tables and figures from CSV logs. Code and scripts are available at https://github.com/SV25-22/FuseSampleAgg.

#ai

#product

#open_source

Score · 2.80

Page 17 of 94