copsal-weakly-stvg Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding