Abstract:
With rapid urbanization and the continuous expansion of urban water infrastructure, pollution source identification in pipe networks has become a critical task for water environment management, risk control, and emergency response. Contamination events in drainage and water distribution networks are often hidden, transient, and uncertain, where source locations, release times, durations, and intensities are usually unknown. Furthermore, available monitoring points are frequently limited, and sensor data suffer from noise or missing values, rendering pollutant source tracing a typical inverse problem. Numerical inversion methods provide an effective framework to reconstruct source information from hydraulic, water-quality, and monitoring data. This review summarizes recent research progress on numerical inversion methods for pollutant source identification in urban pipe networks. Four major categories of methods are discussed: mechanistic model-based optimization, probabilistic methods, data assimilation, and surrogate model-based methods. Mechanistic model-based optimization methods employ hydraulic and water-quality transport models as forward simulators to estimate source parameters by minimizing the discrepancies between simulated and observed responses. While providing strong physical interpretability and quantitative source information, they usually require repeated forward simulations, incurring high computational costs in large-scale networks. Probabilistic methods describe source parameters using probability distributions; they are capable of quantifying uncertainty and are well-suited for inverse problems characterized by measurement errors and non-unique solutions, though their performance heavily depends on prior information, likelihood functions, and sampling efficiency. Data assimilation methods combine model predictions with real-time or quasi-real-time observations to dynamically update system states and source parameters, making them highly effective for online tracking despite their dependency on reliable sensor configurations. Surrogate model-based methods utilize machine learning or deep learning to approximate source-response relationships, significantly enhancing computational efficiency for rapid identification in large-scale networks, although their accuracy remains constrained by the quality of training samples and their physical interpretability requires further enhancement. In summary, these methods exhibit distinct trade-offs in physical interpretability, computational efficiency, uncertainty quantification, and real-time applicability. Mechanistic models are ideal for high-confidence offline analysis; probabilistic methods excel in risk-based decision-making; data assimilation supports online dynamic tracking; and surrogate models are best suited for rapid screening and early warning. Future research should focus on integrating mechanistic models with data-driven approaches, advancing the application of deep learning and graph neural networks, developing real-time online source identification systems, and enhancing robustness across diverse network topologies and pollution scenarios. These advancements will provide stronger technical support for precise pollutant tracing, contaminant control, and informed decision-making in urban pipeline water systems.