LinkedInLinkGitHubLink

Room X1.52, Lautrupvang 15, 

2750 Ballerup, Denmark

Lei You (Ph.D.)

Assistant Professor in Applied Mathematics

Department of Engineering Technology

Technical University of Denmark

I received my Ph.D. in Computer Science (specializing in Mathematical Optimization) from the Department of Information Technology at Uppsala University in 2019. During the PhD, I interned as a visiting data scientist at The Boston Consulting Group (BCG) Gamma. After the PhD, I worked as a data scientist in Bolt and Wolt (Doordash) in the domain of on-demand logistics optimization. 

My current research interests are centered around theories of interpretability and efficiency of machine learning models toward On-Device AI. I explore strategies to streamline complex models without performance loss and unravel the intricate mechanisms of decision-making models. Central to this pursuit is understanding the synergy between model simplification and explainability: Reducing a model's complexity aids in elucidating its functions, and concurrently, explainability drives the efficient compression of the learning model.

Recent work


L. You, Y. Bian, and L. Cao, "Refining Counterfactual Explanations With Joint-Distribution-Informed Shapley Towards Actionable Minimality", preprint. [arXiv] [code] [software]

Counterfactual explanations (CE) identify data points that closely resemble the observed data but produce different machine learning (ML) model outputs, offering critical insights into model decisions. Despite the diverse scenarios, goals and tasks to which they are tailored, existing CE methods often lack actionable efficiency because of unnecessary feature changes included within the explanations that are presented to users and stakeholders. We address this problem by proposing a method that minimizes the required feature changes while maintaining the validity of CE, without imposing restrictions on models or CE algorithms, whether instance- or group-based. The key innovation lies in computing a joint distribution between observed and counterfactual data and leveraging it to inform Shapley values for feature attributions (FA). We demonstrate that optimal transport (OT) effectively derives this distribution, especially when the alignment between observed and counterfactual data is unclear in used CE methods. Additionally, a counterintuitive finding is uncovered: it may be misleading to rely on an exact alignment defined by the CE generation mechanism in conducting FA. Our proposed method is validated on extensive experiments across multiple datasets, showcasing its effectiveness in refining CE towards greater actionable efficiency. 


L. You, L. Cao, M. Nilsson, B. Zhao, and L. Lei, "Distributional Counterfactual Explanation With Optimal Transport", preprint. [arXiv] [code]

Counterfactual explanations (CE) are the de facto method for providing insights into black-box decision-making models by identifying alternative inputs that lead to different outcomes. However, existing CE approaches, including group and global methods, focus predominantly on specific input modifications, lacking the ability to capture nuanced distributional characteristics that influence model outcomes across the entire input-output spectrum. This paper proposes distributional counterfactual explanation (DCE), shifting focus to the distributional properties of observed and counterfactual data, thus providing broader insights. DCE is particularly beneficial for stakeholders making strategic decisions based on statistical data analysis, as it makes the statistical distribution of the counterfactual resembles the one of the factual when aligning model outputs with a target distribution\textemdash something that the existing CE methods cannot fully achieve. We leverage optimal transport (OT) to formulate a chance-constrained optimization problem, deriving a counterfactual distribution aligned with its factual counterpart, supported by statistical confidence. The efficacy of this approach is demonstrated through experiments, highlighting its potential to provide deeper insights into decision-making models.


Z. Senane, L. Cao, V. L. Buchner, Y. Tashiro, L. You, P. Herman, M. Nordahl, R. Tu, and V. Ehrenheim, "Self-Supervised Learning of Time Series Representation via Diffusion Process and Imputation-Interpolation-Forecasting Mask", accepted in Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD) 2024. [arXiv] [code]

Time Series Representation Learning (TSRL) focuses on generating informative representations for various Time Series (TS) modeling tasks. Traditional Self-Supervised Learning (SSL) methods in TSRL fall into four main categories: reconstructive, adversarial, contrastive, and predictive, each with a common challenge of sensitivity to noise and intricate data nuances. Recently, diffusion-based methods have shown advanced generative capabilities. However, they primarily target specific application scenarios like imputation and forecasting, leaving a gap in leveraging diffusion models for generic TSRL. Our work, Time Series Diffusion Embedding (TSDE), bridges this gap as the first diffusion-based SSL TSRL approach. TSDE segments TS data into observed and masked parts using an Imputation-Interpolation-Forecasting (IIF) mask. It applies a trainable embedding function, featuring dual-orthogonal Transformer encoders with a crossover mechanism, to the observed part. We train a reverse diffusion process conditioned on the embeddings, designed to predict noise added to the masked part. Extensive experiments demonstrate TSDE’s superiority in imputation, interpolation, forecasting, anomaly detection, classification, and clustering. We also conduct an ablation study, present embedding visualizations, and compare inference speed, further substantiating TSDE’s efficiency and validity in learning representations of TS data.


L. You and H. V. Cheng, "SWAP: Sparse Entropic WAsserstein Regression for Robust Network Pruning", International Conference on Learning Representations (ICLR) 2024. [arXiv] [code]


This study tackles the issue of neural network pruning that inaccurate gradients exist when computing the empirical Fisher Information Matrix (FIM). We introduce SWAP, an Entropic Wasserstein regression (EWR) network pruning formulation, capitalizing on the geometric attributes of the optimal transport (OT) problem. The “swap” of a commonly used standard linear regression (LR) with the EWR in optimization is analytically showcased to excel in noise mitigation by adopting neighborhood interpolation across data points, yet incurs marginal extra computational cost. The unique strength of SWAP is its intrinsic ability to strike a balance between noise reduction and covariance information preservation. Extensive experiments performed on various networks show comparable performance of SWAP with state-of-the-art (SoTA) network pruning algorithms. Our proposed method outperforms the SoTA when the network size or the target sparsity is large, the gain is even larger with the existence of noisy gradients, possibly from noisy data, analog memory, or adversarial attacks. Notably, our proposed method achieves a gain of 6% improvement in accuracy and 8% improvement in testing loss for MobileNetV1 with less than one-fourth of the network parameters remaining. 

Other Research work (Selected)

(See a full list of publications here)


L. You, "Weighted Sum-Rate Maximization With Causal Inference for Latent Interference Estimation", IEEE International Conference on Communications (ICC) 2023. [link] [code]

The paper investigates the weighted sum-rate maximization (WSRM) problem with latent interfering sources outside the known network, whose power allocation policy is hidden from and uncontrollable to optimization. The paper extends the famous alternate optimization algorithm weighted minimum mean square error (WMMSE) under a causal inference framework to tackle with WSRM. Specifically, with the possibility of power policy shifting in the hidden network, computing an iterating direction based only on the observed interference inherently implies that counterfactual is ignored in decision making. A method called synthetic control (SC) is used to estimate the counterfactual. For any link in the known network, SC constructs a convex combination of the interference on other links and uses it as an estimate for the counterfactual. Power iteration in the proposed SC-WMMSE is performed taking into account both the observed interference and its counterfactual. SC-WMMSE requires no more information than the original WMMSE in the optimization stage. To our best knowledge, this is the first paper explores the potential of SC in assisting mathematical optimization in addressing classic wireless optimization problems. Numerical results suggest the superiority of the SC-WMMSE over the original in both convergence and objective. 


L. You, D. Yuan, L. Lei, S. Sun, S. Chatzinotas, and B. Ottersten. “Resource optimization with load coupling in multi-cell NOMA”. IEEE Transactions on Wireless Communications, vol.17, no.7, 2018. [arXiv] [code] 

Optimizing non-orthogonal multiple access (NOMA) in multi-cell scenarios is much more challenging than the single-cell case because inter-cell interference must be considered. Most papers addressing NOMA consider a single cell. We take a significant step of analyzing NOMA in multi-cell scenarios. We explore the potential of NOMA networks in achieving optimal resource utilization with arbitrary topologies. Towards this goal, we investigate a broad class of problems consisting in optimizing power allocation and user pairing for any cost function that is monotonically increasing in time-frequency resource consumption. We propose an algorithm that achieves global optimality for this problem class. The basic idea is to prove that solving the joint optimization problem of power allocation, user pair selection, and time-frequency resource allocation amounts to solving a so-called iterated function without a closed form. We prove that the algorithm approaches optimality with fast convergence. Numerically, we evaluate and demonstrate the performance of NOMA for multi-cell scenarios in terms of resource efficiency and load balancing. 


A note that strengthens the result of this paper is as follows.

L. You and D. Yuan. “A note on decoding order in user grouping and power optimization for multi-cell NOMA with load coupling”. IEEE Transactions on Wireless Communications, vol.20 no.1, 2021. [arXiv]


L. You, Q. Liao, N. Pappas, and D. Yuan. “Resource Optimization with Flexible Numerology and Frame Structure for Heterogeneous Services”. IEEE Communications Letters, vol.22, no.12, 2018. [arXiv] [code]

We explore the potential of optimizing resource allocation with flexible numerology in frequency domain and variable frame structure in time domain, in presence of services with different types of requirements. We analyze the computational complexity and propose a scalable optimization algorithm based on searching in both the primal space and dual space that are complementary to each other. Numerical results show significant advantages of adopting flexibility in both time and frequency domains for capacity enhancement and meeting the requirements of mission critical services.


L. You and D. Yuan. “User-centric performance optimization with remote radio head cooperation in C-RAN”, IEEE Transactions on Wireless Communications, vol.19, no.1, 2019. [arXiv]

In a cloud radio access network (C-RAN), distributed remote radio heads (RRHs) are coordinated by baseband units (BBUs) in the cloud. The centralization of signal processing provides flexibility for coordinated multi-point transmission (CoMP) of RRHs to cooperatively serve user equipments (UEs). We target enhancing UEs' capacity performance, by jointly optimizing the selection of RRHs for serving UEs, i.e., resource allocation (and CoMP selection). We analyze the computational complexity of the problem. Next, we prove that under fixed CoMP selection, the optimal resource allocation amounts to solving a so-called iterated function. Towards user-centric network optimization, we propose an algorithm for the joint optimization problem, aiming at maximumly scaling up the capacity for any target UE group of interest. The proposed algorithm enables network-level performance evaluation for quality of experience.