Room X1.52, Lautrupvang 15,
2750 Ballerup, Denmark
Department of Engineering Technology
Technical University of Denmark
I received my Ph.D. in Computer Science (specializing in Mathematical Optimization) from the Department of Information Technology at Uppsala University in 2019. During the PhD, I interned as a visiting data scientist at The Boston Consulting Group (BCG) Gamma. After the PhD, I worked as a data scientist in Bolt and Wolt (Doordash) in the domain of on-demand logistics optimization.
My current research interests are centered around theories of interpretability and efficiency of machine learning models toward On-Device AI. I explore strategies to streamline complex models without performance loss and unravel the intricate mechanisms of decision-making models. Central to this pursuit is understanding the synergy between model simplification and explainability: Reducing a model's complexity aids in elucidating its functions, and concurrently, explainability drives the efficient compression of the learning model.
(See a full list of publications here)
C. Yu, V. Uotila, S. Deng, Q. Wu, T. Shi, S. Jiang, L. You, and B. Zhao, "QUASAR: Quantum Assembly Code Generation Using Tool-Augmented LLMs via Agentic RL", preprint. [arXiv]
Designing and optimizing task-specific quantum circuits are crucial to leverage the advantage of quantum computing. Recent large language model (LLM)-based quantum circuit generation has emerged as a promising automatic solution. However, the fundamental challenges remain unaddressed: (i) parameterized quantum gates require precise numerical values for optimal performance, which also depend on multiple aspects, including the number of quantum gates, their parameters, and the layout/depth of the circuits. (ii) LLMs often generate low-quality or incorrect quantum circuits due to the lack of quantum domain-specific knowledge. We propose QUASAR, an agentic reinforcement learning (RL) framework for quantum circuits generation and optimization based on tool-augmented LLMs. To align the LLM with quantum-specific knowledge and improve the generated quantum circuits, QUASAR designs (i) a quantum circuit verification approach with external quantum simulators and (ii) a sophisticated hierarchical reward mechanism in RL training. Extensive evaluation shows improvements in both syntax and semantic performance of the generated quantum circuits. When augmenting a 4B LLM, QUASAR has achieved the validity of 99.31% in Pass@1 and 100% in Pass@10, outperforming industrial LLMs of GPT-4o, GPT-5 and DeepSeek-V3 and several supervised-fine-tuning (SFT)-only and RL-only baselines. We release our model at HuggingFace and provide the training code at GitHub.
L. Cao, L. You, and CSPaper Core Team, "CSPaper Review: Fast, Rubric-Faithful Conference Feedback", International Natural Language Generation Conference (INLG) 2025, accepted. [paper] [demo] [discussion]
CSPaper Review (CSPR) is a free, AI-powered tool for rapid, conference-specific peer review in Computer Science (CS). Addressing the bottlenecks of slow, inconsistent, and generic feedback in existing solutions, CSPR leverages Large Language Models (LLMs) agents and tailored workflows to deliver realistic and actionable reviews within one minute. In merely four weeks, it served more than 7,000 unique users from 80 countries and processed over 15,000 reviews, highlighting a strong demand from the CS community. We present our architecture, design choices, benchmarks, user analytics and future road maps.
Counterfactual explanations (CE) identify data points that closely resemble the observed data but produce different machine learning (ML) model outputs, offering critical insights into model decisions. Despite the diverse scenarios, goals and tasks to which they are tailored, existing CE methods often lack actionable efficiency because of unnecessary feature changes included within the explanations that are presented to users and stakeholders. We address this problem by proposing a method that minimizes the required feature changes while maintaining the validity of CE, without imposing restrictions on models or CE algorithms, whether instance- or group-based. The key innovation lies in computing a joint distribution between observed and counterfactual data and leveraging it to inform Shapley values for feature attributions (FA). We demonstrate that optimal transport (OT) effectively derives this distribution, especially when the alignment between observed and counterfactual data is unclear in used CE methods. Additionally, a counterintuitive finding is uncovered: it may be misleading to rely on an exact alignment defined by the CE generation mechanism in conducting FA. Our proposed method is validated on extensive experiments across multiple datasets, showcasing its effectiveness in refining CE towards greater actionable efficiency.
Counterfactual explanations (CE) identify data points that closely resemble the observed data but produce different machine learning (ML) model outputs, offering critical insights into model decisions. Despite the diverse scenarios, goals and tasks to which they are tailored, existing CE methods often lack actionable efficiency because of unnecessary feature changes included within the explanations that are presented to users and stakeholders. We address this problem by proposing a method that minimizes the required feature changes while maintaining the validity of CE, without imposing restrictions on models or CE algorithms, whether instance- or group-based. The key innovation lies in computing a joint distribution between observed and counterfactual data and leveraging it to inform Shapley values for feature attributions (FA). We demonstrate that optimal transport (OT) effectively derives this distribution, especially when the alignment between observed and counterfactual data is unclear in used CE methods. Additionally, a counterintuitive finding is uncovered: it may be misleading to rely on an exact alignment defined by the CE generation mechanism in conducting FA. Our proposed method is validated on extensive experiments across multiple datasets, showcasing its effectiveness in refining CE towards greater actionable efficiency.
Counterfactual explanations (CE) are the de facto method for providing insights into black-box decision-making models by identifying alternative inputs that lead to different outcomes. However, existing CE approaches, including group and global methods, focus predominantly on specific input modifications, lacking the ability to capture nuanced distributional characteristics that influence model outcomes across the entire input-output spectrum. This paper proposes distributional counterfactual explanation (DCE), shifting focus to the distributional properties of observed and counterfactual data, thus providing broader insights. DCE is particularly beneficial for stakeholders making strategic decisions based on statistical data analysis, as it makes the statistical distribution of the counterfactual resembles the one of the factual when aligning model outputs with a target distribution\textemdash something that the existing CE methods cannot fully achieve. We leverage optimal transport (OT) to formulate a chance-constrained optimization problem, deriving a counterfactual distribution aligned with its factual counterpart, supported by statistical confidence. The efficacy of this approach is demonstrated through experiments, highlighting its potential to provide deeper insights into decision-making models.
Z. Senane, L. Cao, V. L. Buchner, Y. Tashiro, L. You, P. Herman, M. Nordahl, R. Tu, and V. Ehrenheim, "Self-Supervised Learning of Time Series Representation via Diffusion Process and Imputation-Interpolation-Forecasting Mask", Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD) 2024. [arXiv] [code]
Time Series Representation Learning (TSRL) focuses on generating informative representations for various Time Series (TS) modeling tasks. Traditional Self-Supervised Learning (SSL) methods in TSRL fall into four main categories: reconstructive, adversarial, contrastive, and predictive, each with a common challenge of sensitivity to noise and intricate data nuances. Recently, diffusion-based methods have shown advanced generative capabilities. However, they primarily target specific application scenarios like imputation and forecasting, leaving a gap in leveraging diffusion models for generic TSRL. Our work, Time Series Diffusion Embedding (TSDE), bridges this gap as the first diffusion-based SSL TSRL approach. TSDE segments TS data into observed and masked parts using an Imputation-Interpolation-Forecasting (IIF) mask. It applies a trainable embedding function, featuring dual-orthogonal Transformer encoders with a crossover mechanism, to the observed part. We train a reverse diffusion process conditioned on the embeddings, designed to predict noise added to the masked part. Extensive experiments demonstrate TSDE’s superiority in imputation, interpolation, forecasting, anomaly detection, classification, and clustering. We also conduct an ablation study, present embedding visualizations, and compare inference speed, further substantiating TSDE’s efficiency and validity in learning representations of TS data.
This study tackles the issue of neural network pruning that inaccurate gradients exist when computing the empirical Fisher Information Matrix (FIM). We introduce SWAP, an Entropic Wasserstein regression (EWR) network pruning formulation, capitalizing on the geometric attributes of the optimal transport (OT) problem. The “swap” of a commonly used standard linear regression (LR) with the EWR in optimization is analytically showcased to excel in noise mitigation by adopting neighborhood interpolation across data points, yet incurs marginal extra computational cost. The unique strength of SWAP is its intrinsic ability to strike a balance between noise reduction and covariance information preservation. Extensive experiments performed on various networks show comparable performance of SWAP with state-of-the-art (SoTA) network pruning algorithms. Our proposed method outperforms the SoTA when the network size or the target sparsity is large, the gain is even larger with the existence of noisy gradients, possibly from noisy data, analog memory, or adversarial attacks. Notably, our proposed method achieves a gain of 6% improvement in accuracy and 8% improvement in testing loss for MobileNetV1 with less than one-fourth of the network parameters remaining.
The paper investigates the weighted sum-rate maximization (WSRM) problem with latent interfering sources outside the known network, whose power allocation policy is hidden from and uncontrollable to optimization. The paper extends the famous alternate optimization algorithm weighted minimum mean square error (WMMSE) under a causal inference framework to tackle with WSRM. Specifically, with the possibility of power policy shifting in the hidden network, computing an iterating direction based only on the observed interference inherently implies that counterfactual is ignored in decision making. A method called synthetic control (SC) is used to estimate the counterfactual. For any link in the known network, SC constructs a convex combination of the interference on other links and uses it as an estimate for the counterfactual. Power iteration in the proposed SC-WMMSE is performed taking into account both the observed interference and its counterfactual. SC-WMMSE requires no more information than the original WMMSE in the optimization stage. To our best knowledge, this is the first paper explores the potential of SC in assisting mathematical optimization in addressing classic wireless optimization problems. Numerical results suggest the superiority of the SC-WMMSE over the original in both convergence and objective.