Video annotation is the allocation of video shots or video segments to different, predefined semantic concepts such as person, car, sky, people walking, etc. We aim to present a novel approach to semi-automatically annotate the videos based on visual attention, which can detect the focus of interest such as salient objects in video frames automatically. The pre-selection of regions of interest facilitates the recognition of objects of different shapes, poses, scales, and illuminations and benefits the tedious manual labeling.