Visual object tracking aims to associate detections that correspond to the same object across frames. This is a fundamental visual task In the context of smart cities, in order to further analyse human or crowd behaviours for urban monitoring. However, maintaining accurate tracking for a long period without ID switches, or in the presence of missing detections, still remains an open challenge due to scene complexity, scene clutters, similar appearance and varying view points.