In order to assist the design of short interfering ribonucleic acids (siRNA), 573 non-redundant siRNAs were collected from published literatures and the relationship between siRNAs sequences and RNA interference (RNAi) effect is analyzed by a support vector machine (SVM) based algorithm relied on a basebase correlation (BBC) feature. The results show that the proposed algorithm has the highest area under curve (AUC) value (0. 73) of the receive operating characteristic (ROC) curve and the greatest r value (0. 43) of the Pearson's correlation coefficient. This indicates that the proposed algorithm is better than the published algorithms on the collected datasets and that more attention should be paid to the base-base correlation information in future siRNA design.
A novel method for predicting hotspots and coldspots using support vector machine (SVM) based on statistical learning theory is developed. This method is applied to published 303 hot and 48 cold open reading frames (ORFs) in Saccharomyces cerevisiae. The sequence features of general dinucleotide abundance and dinucleotide abundance based on codon usage are extracted, and then the data sets are classified with different parameters and kernel functions combined with the method of two-fold cross validation. The result indicates that 87.47% accuracy can be reached when classifying hot and cold ORF sequences with the kernel of radial basis function combined with dinucleotide abundance based on codon usage.