BSN: Boundary Sensitive Network for Temporal Action Proposal Generation
BSN: Boundary Sensitive Network for Temporal Action Proposal Generation
Tianwei Lin and Xu Zhao and Haisheng Su and Chongjing Wang and Ming Yang
2018

Paper summary
daisy
## Boundary sensitive network
### **keyword**: action detection in video; accurate proposal
**Summary**: In order to generate precise temporal boundaries and improve recall with lesses proposals, Tianwei Lin et al use BSN which first combine temporal boundaries with high probability to form proposals and then select proposals by evaluating whether a proposal contains an action(confidence score+ boundary probability).
**Model**:
1. video feature encoding: use the two-stream extractor to form the input of BSN. $F = \{f_{tn}\}_{n=1}^{l_s} = \{(f_{S,Tn}, f_{T,t_n}\}_{n=1}^{l_s)} $
2. BSN:
* temporal evaluation: input feature sequence, using 3-layer CNN+3 fiter with sigmoid, to generate start, end, and actioness probability
* proposal generation: 1.combine bound with high start/end probability or if probility peak to form proposal; 2. use actioness probability to generate proposal feature for each proposal by sampling the actioness probability during proposal region.
* proposal evaluation: using 1 hidden layer perceptron to evaluate confidence score based on proposal features.
proposal $\varphi =(t_s,t_e,p_{conf},p_{t_s}^s,p_{t_e}^e) $ $p_{t_e}^e$ is the end probability,and $p_{conf}$ is confidence score
https://i.imgur.com/VjJLQDc.png
**Training**:
* **Learn to generate probility curve**:
In order to calculate the accuracy of proposals the loss in the temporal evaluation is calculated as following:
$L_{TEM} = \lambda L^{action} + L ^{start} + L^{end}$;
$L = \frac{1}{l_w} \sum_{i =1}^{l_w}(\frac{l_w}{l_w-\sum_i g_i} b_i*log(p_i)+\frac{l_w}{\sum_i g_i} (1-b_i)*log(1-p_i))$
$ b_i = sign(g_i-\theta_{IoP})$
Thus, if start region proposal is highly overlapped with ground truth, the start point probability should increase to lower the loss, after training, the information of ground truth region could be leveraged to predict the accurate probability for start. actions and end probability could apply the same rule.
* **Learn to choose right proposal**:
In order to choose the right proposal based on confidence score, push confidence score to match with IOU of the groud truth and proposal is important. So the loss to do this is described as follow:
$L_p = \frac{1}{N_{train}} \sum_{i=1}^{N_{train}} (p_{conf,i}-g_{iou,i})^2$. $N_{train}$ is number of training proposals and among it the ratio of positive to negative proposal is 1:2.$g_{iou,i}$ is the ith proposal's overlap with its corresponding ground truth.
During test and prediction, the final confidence is calculated to fetch and suppress proposals using gaussian decaying soft-NMS. and final confidence score for each proposal is $p_f = p_{conf}p_{ts}^sp_{te}^e$
Thus, after training, the confidence score should reveal the iou between the proposal and its corresponding ground truth based on the proposal feature which is generated through actionness probability, whereas final proposal is achieved by ranking final confidence score.
**Conclusion**: Different with segment proposal or use RNN to decide where to look next, this paper generate proposals with boundary probability and select them using the confidence score-- the IOU between the proposal and corresponding ground truth. with sufficient data, it can provide right bound probability and confidence score. and the highlight of the paper is it can be very accurate within feature sequence.
*However, it only samples part of the video for feature sequence. so it is possible it will jump over the boundary point. if an accurate policy to decide where to sample is used, accuracy should be further boosted. *
* **computation complexity**: within this network, computation includes
1. two-stream feature extractor for video samples
2. probility generation module: 3-layers cnn for the generated sequence
3. proposal generation using combination
4. sampler to generate proposal feature
5. 1-hidden layer perceptron to generate confidence score.
major computing complexity should attribute to feature extractor(1') and proposal relate module if lots of proposals are generated(3',4')
**Performance**: when combined with SCNN-classifier, it reach map@0.5 = 36.9 on THUMOS14 dataset
BSN: Boundary Sensitive Network for Temporal Action Proposal Generation

Tianwei Lin and Xu Zhao and Haisheng Su and Chongjing Wang and Ming Yang

arXiv e-Print archive - 2018 via Local arXiv

Keywords: cs.CV

**First published:** 2018/06/08 (1 year ago)

**Abstract:** Temporal action proposal generation is an important yet challenging problem,
since temporal proposals with rich action content are indispensable for
analysing real-world videos with long duration and high proportion irrelevant
content. This problem requires methods not only generating proposals with
precise temporal boundaries, but also retrieving proposals to cover truth
action instances with high recall and high overlap using relatively fewer
proposals. To address these difficulties, we introduce an effective proposal
generation method, named Boundary-Sensitive Network (BSN), which adopts "local
to global" fashion. Locally, BSN first locates temporal boundaries with high
probabilities, then directly combines these boundaries as proposals. Globally,
with Boundary-Sensitive Proposal feature, BSN retrieves proposals by evaluating
the confidence of whether a proposal contains an action within its region. We
conduct experiments on two challenging datasets: ActivityNet-1.3 and THUMOS14,
where BSN outperforms other state-of-the-art temporal action proposal
generation methods with high recall and high temporal precision. Finally,
further experiments demonstrate that by combining existing action classifiers,
our method significantly improves the state-of-the-art temporal action
detection performance.
more
less

Tianwei Lin and Xu Zhao and Haisheng Su and Chongjing Wang and Ming Yang

arXiv e-Print archive - 2018 via Local arXiv

Keywords: cs.CV

You must log in before you can submit this summary! Your draft will not be saved!

Preview:

About