Login Paper Search My Schedule Paper Index Help

My ICIP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper IDSS-MRII.5
Paper Title A HYBRID TWO-STREAM APPROACH FOR MULTI-PERSON ACTION RECOGNITION IN TOP-VIEW 360 DEGREE VIDEOS
Authors Karen Stephen, Jianquan Liu, Vivek Barsopia, NEC Corporation, Japan
SessionSS-MRII: Special Session: Models and representations for Immersive Imaging
LocationArea A
Session Time:Wednesday, 22 September, 08:00 - 09:30
Presentation Time:Wednesday, 22 September, 08:00 - 09:30
Presentation Poster
Topic Special Sessions: Models and Representations for Immersive Imaging
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Abstract Action recognition in top-view 360 degree videos is an emerging research topic in computer vision. Existing work utilizes a global projection method to transform 360 degree video frames to panorama frames for further processing. However, this unwrapping suffers from a problem of geometric distortion i.e., people present near the centre in the 360 degree video frames appear highly stretched and distorted in the corresponding panorama frames (observed in 37.5% of the total panorama frames for 360Action dataset). Thus, recognizing the actions of people who are near the centre becomes difficult, thereby affecting the overall action recognition performance. In this work, we overcome the above challenge by utilizing distortion-free person-centric images of the persons near the centre, extracted directly from the input 360 degree video frames. We propose a simple yet effective hybrid two-stream architecture consisting of a panorama stream and a person-centric stream where predictions from both streams are combined together to detect the overall actions in a video. We perform experiments to validate the efficacy of the proposed method on the recently introduced 360Action dataset and achieve an overall improvement of 2.3% mAP compared to state-of-the-art method and a maximum improvement of 22.7% AP for pickup action, which happens mostly near the centre.