leiyo@dtu.dk
Department of Engineering Technology
Technical University of Denmark
I received my Ph.D. in Computer Science (specializing in Mathematical Optimization) from the Department of Information Technology at Uppsala University in 2019. During the PhD, I interned as a visiting data scientist at The Boston Consulting Group (BCG) Gamma. After the PhD, I worked as a data scientist in Bolt and Wolt (Doordash) in the domain of on-demand logistics optimization.
My research develops mathematical optimization frameworks for trustworthy and efficient AI. I view modern AI systems as solutions to coupled optimization problems: we do not only optimize for predictive accuracy, but also for how models behave under interventions, how individuals move inside a population, how sensitive groups correspond to each other, and how dense and sparse models co-evolve across deployment changes.
A unifying theme is what I call dependence-structured alignment (DSA): explanations, actions, and guarantees are all derived from the same optimized dependence structure. In practice, this means formulating and solving optimization problems that jointly learn (i) instance–reference couplings for explanation and recourse, (ii) population-level transport maps for distributional shifts, (iii) cross-group matchings for fairness interventions, and (iv) sparse–dense consistency constraints for pruning and model compression.
(See a full list of publications here)
[Theme #3] Y. Bian, L. You, Y. Sasaki, H. Maeda, and A. Igarashi, "Algorithmic Fairness: Not a Purely Technical but Socio-Technical Property", preprint. [arXiv]
The rapid trend of deploying artificial intelligence (AI) and machine learning (ML) systems in socially consequential domains has raised growing concerns about their trustworthiness, including potential discriminatory behaviours. Research in algorithmic fairness has generated a proliferation of mathematical definitions and metrics, yet persistent misconceptions and limitations—both within and beyond the fairness community—limit their effectiveness, such as an unreached consensus on its understanding, prevailing measures primarily tailored to binary group settings, and superficial handling for intersectional contexts. Here we critically remark on these misconceptions and argue that fairness cannot be reduced to purely technical constraints on models; we also examine the limitations of existing fairness measures through conceptual analysis and empirical illustrations, showing their limited applicability in the face of complex real-world scenarios, challenging prevailing views on the incompatibility between accuracy and fairness as well as that among fairness measures themselves, and outlining three worth-considering principles in the design of fairness measures. We believe these findings will help bridge the gap between technical formalisation and social realities and meet the challenges of real-world AI/ML deployment.
The rise of the machine learning (ML) model economy has intertwined markets for training datasets and pre-trained models. However, most pricing approaches still separate data and model transactions or rely on broker-centric pipelines that favor one side. Recent studies of data markets with externalities capture buyer interactions but do not yield a simultaneous and symmetric mechanism across data sellers, model producers, and model buyers. We propose a unified data–model coupled market that treats dataset and model trading as a single system. A supply side mapping transforms dataset payments into buyer visible model quotations, while a demand side mapping propagates buyer prices back to datasets through Shapley-based allocation. Together, they form a closed loop that links four interactions: supply–demand propagation in both directions and mutual coupling among buyers and among sellers. We prove that the joint operator is a standard interference function (SIF), guaranteeing existence, uniqueness, and global convergence of equilibrium prices. Experiments demonstrate efficient convergence and improved fairness compared with broker-centric and one-sided baselines.
[Theme #1] L. Cao ✉, L. You ✉, and CSPaper Core Team, "CSPaper Review: Fast, Rubric-Faithful Conference Feedback", International Natural Language Generation Conference (INLG) 2025. [paper] [demo] [discussion]
CSPaper Review (CSPR) is a free, AI-powered tool for rapid, conference-specific peer review in Computer Science (CS). Addressing the bottlenecks of slow, inconsistent, and generic feedback in existing solutions, CSPR leverages Large Language Models (LLMs) agents and tailored workflows to deliver realistic and actionable reviews within one minute. In merely four weeks, it served more than 7,000 unique users from 80 countries and processed over 15,000 reviews, highlighting a strong demand from the CS community. We present our architecture, design choices, benchmarks, user analytics and future road maps.
Ensuring fairness in machine learning models is critical, particularly in high-stakes domains where biased decisions can lead to serious societal consequences. However, existing preprocessing approaches generally lack transparent mechanisms for identifying which features are responsible for unfairness. This obscures the rationale behind data modifications. We introduce FairSHAP, a novel preprocessing framework that leverages Shapley value attribution to improve both individual and group fairness. FairSHAP identifies fairness-critical features in the training data using an interpretable measure of feature importance, and systematically modifies them through instance-level matching across sensitive groups. Our method directly reduces discriminative risk (DR) with solid theoretical guarantees, while simultaneously bounding the upper limit of demographic parity (DP), which in practice leads to its reduction. Experiments on multiple tabular datasets show that we achieve stateof-the-art or comparable performance across DR, DP, and EO with minimal modifications, thereby preserving data fidelity. As a modelagnostic and transparent method, FairSHAP integrates seamlessly into existing machine learning pipelines and provides actionable insights into the sources of bias. Our code is on https://github.com/youlei202/FairS HAP.
Counterfactual explanations (CE) identify data points that closely resemble the observed data but produce different machine learning (ML) model outputs, offering critical insights into model decisions. Despite the diverse scenarios, goals and tasks to which they are tailored, existing CE methods often lack actionable efficiency because of unnecessary feature changes included within the explanations that are presented to users and stakeholders. We address this problem by proposing a method that minimizes the required feature changes while maintaining the validity of CE, without imposing restrictions on models or CE algorithms, whether instance- or group-based. The key innovation lies in computing a joint distribution between observed and counterfactual data and leveraging it to inform Shapley values for feature attributions (FA). We demonstrate that optimal transport (OT) effectively derives this distribution, especially when the alignment between observed and counterfactual data is unclear in used CE methods. Additionally, a counterintuitive finding is uncovered: it may be misleading to rely on an exact alignment defined by the CE generation mechanism in conducting FA. Our proposed method is validated on extensive experiments across multiple datasets, showcasing its effectiveness in refining CE towards greater actionable efficiency.
Counterfactual explanations (CE) are the de facto method for providing insights into black-box decision-making models by identifying alternative inputs that lead to different outcomes. However, existing CE approaches, including group and global methods, focus predominantly on specific input modifications, lacking the ability to capture nuanced distributional characteristics that influence model outcomes across the entire input-output spectrum. This paper proposes distributional counterfactual explanation (DCE), shifting focus to the distributional properties of observed and counterfactual data, thus providing broader insights. DCE is particularly beneficial for stakeholders making strategic decisions based on statistical data analysis, as it makes the statistical distribution of the counterfactual resembles the one of the factual when aligning model outputs with a target distribution\textemdash something that the existing CE methods cannot fully achieve. We leverage optimal transport (OT) to formulate a chance-constrained optimization problem, deriving a counterfactual distribution aligned with its factual counterpart, supported by statistical confidence. The efficacy of this approach is demonstrated through experiments, highlighting its potential to provide deeper insights into decision-making models.
This study tackles the issue of neural network pruning that inaccurate gradients exist when computing the empirical Fisher Information Matrix (FIM). We introduce SWAP, an Entropic Wasserstein regression (EWR) network pruning formulation, capitalizing on the geometric attributes of the optimal transport (OT) problem. The “swap” of a commonly used standard linear regression (LR) with the EWR in optimization is analytically showcased to excel in noise mitigation by adopting neighborhood interpolation across data points, yet incurs marginal extra computational cost. The unique strength of SWAP is its intrinsic ability to strike a balance between noise reduction and covariance information preservation. Extensive experiments performed on various networks show comparable performance of SWAP with state-of-the-art (SoTA) network pruning algorithms. Our proposed method outperforms the SoTA when the network size or the target sparsity is large, the gain is even larger with the existence of noisy gradients, possibly from noisy data, analog memory, or adversarial attacks. Notably, our proposed method achieves a gain of 6% improvement in accuracy and 8% improvement in testing loss for MobileNetV1 with less than one-fourth of the network parameters remaining.