Abstract: Weakly-supervised temporal action localization aims to localize action instances from untrimmed videos with only video-level labels. Due to the lack of frame-wise annotations, most methods ...