In the context of smart cities, ambient sensors, e.g. CCTV cameras, continuously provide large data feed in a daily basis which is very challenging for human operators to monitor. The rich behavioural and contextual information in the raw visual data can be automatically analysed by algorithms that provide alerts if an abnormal  pattern is observed. Automatic anomaly detection from video streams is a fundamental task to provide support for the building of secure public places. Our unit develops such technology and in particular focuses on effective transfer learning strategy that can effectively address anomaly detection in various contexts, e.g.  when scenes are different or when the patterns of anomaly changes from time to time.

We have developed AnomalyCLIP, a solution for video anomaly recognition. AnomalyCLIP not only detects anomalous events but also recognizes the underlying activities, providing more informative and actionable insights.

Additionally, we have developed Language-based Video Anomaly Detection (LAVAD), a method that addresses video anomaly detection without requiring training or data collection. LAVAD leverages the capabilities of pre-trained large language models and existing vision-language models. Training-free VAD is essential for deploying VAD systems in real-world settings where data collection may not be feasible.