Video description aims to generate descriptive natural language for videos.Inspired from the deep neural network(DNN) used in the machine translation,the video description(VD) task applies the convolutional neural network(CNN) to extracting video features and the long short-term memory(LSTM) to generating descriptions.However,some models generate incorrect words and syntax.The reason may because that the previous models only apply LSTM to generate sentences,which learn insufficient linguistic information.In order to solve this problem,an end-to-end DNN model incorporated subject,verb and object(SVO) supervision is proposed.Experimental results on a publicly available dataset,i.e.Youtube2 Text,indicate that our model gets a 58.4% consensus-based image description evaluation(CIDEr) value.It outperforms the mean pool and video description with first feed(VD-FF) models,demonstrating the effectiveness of SVO supervision.
Community-based question answer(CQA) makes a figure network in development of social network. Similar question retrieval is one of the most important tasks in CQA. Most of the previous works on similar question retrieval were given with the underlying assumption that answers are similar if their questions are similar, but no work was done by modeling similarity measure with the constraint of the assumption. A new method of modeling similarity measure is proposed by constraining the measure with the assumption, and employing ensemble learning to get a comprehensive measure which integrates different context features for similarity measuring, including lexical, syntactic, semantic and latent semantic. Experiments indicate that the integrated model could get a relatively high performance consistence between question set and answer set. Models with better consistency tend to get a better precision according to answers.
SUN Yue-pingWANG Xiao-jieWANG Xu-wenJIANG Shao-weiLIU Yong-bin