条件随机场(condition random fields,CRFs)可用于解决各种文本分析问题,如自然语言处理(natural language processing,NLP)中的序列标记、中文分词、命名实体识别、实体间关系抽取等.传统的运行在单节点上的条件随机场在处理大规模文本时,面临一系列挑战.一方面,个人计算机遇到处理的瓶颈从而难以胜任;另一方面,服务器执行效率较低.而通过升级服务器的硬件配置来提高其计算能力的方法,在处理大规模的文本分析任务时,终究不能从根本上解决问题.为此,采用"分而治之"的思想,基于Apache Spark的大数据处理框架设计并实现了运行在集群环境下的分布式CRFs——SparkCRF.实验表明,SparkCRF在文本分析任务中,具有高效的计算能力和较好的扩展性,并且具有与传统的单节点CRF++相同水平的准确率.
Because of the simplicity of cells, the key to building biological computing systems may lie in constructing distributed systems based on cell–cell communication. Guided by a mathematical model, in this study we designed,simulated, and constructed a genetic double-branch structure in the bacterium Escherichia coli. This genetic double-branch structure is composed of a control cell and two reporter cells.The control cell can activate different reporter cells according to the input. Two quorum-sensing signal molecules, 3OC12-HSL and C4-HSL, form the wires between the control cell and the reporter cells. This study is a step toward scalable biological computation, and it may have many potential applications in biocomputing, biosensing, and biotherapy.