公共文化服务平台

共 4 条记录，以下是 1-4

全选清除导出

排序方式：

强化学习在机器博弈上的应用综述被引量：2: 2021年; 人工智能是未来科技发展的必然趋势,将会对世界产生巨大的影响,而机器博弈更是人工智能研究的热点内容。目前,解决机器博弈问题最先进的算法都来源于强化学习。强化学习是机器学习最重要的方法之一,主要用来解决决策问题。它具有接近人类思维的学习机制,通过试错的方式同环境发生交互,累积最大奖赏并得到最优策略。博弈具有多种多样的形式,内容也十分广泛,根据不同的标准会产生不同的分类,可以将其分为完全信息博弈和非完全信息博弈,但它们都可以通过强化学习进行解决。; 杜康豪宋睿卓魏庆来; 关键词：机器博弈围棋 DOTA2

Optimal Tracking Control for a Class of Unknown Discrete-time Systems with Actuator Saturation via Data-based ADP Algorithm被引量：3: 2013年; 为有致动器浸透和未知动力学的分离时间的系统的一个班的一个新奇最佳的追踪控制方法在这份报纸被建议。计划基于反复的适应动态编程(自动数据处理) 算法。以便实现控制计划，一个 data-based 标识符首先为未知系统动力学被构造。由介绍 M 网络，稳定的控制的明确的公式被完成。以便消除致动器浸透的效果， nonquadratic 表演功能被介绍，然后一个反复的自动数据处理算法被建立与集中分析完成最佳的追踪控制解决方案。为实现最佳的控制方法，神经网络被用来建立 data-based 标识符，计算性能索引功能，近似最佳的控制政策并且分别地解决稳定的控制。模拟例子被提供验证介绍最佳的追踪的控制计划的有效性。; SONG Rui-Zhuo XIAO Wen-Dong SUN Chang-Yin; 关键词：最优跟踪控制离散时间系统饱和执行器

Approximation-error-ADP-based optimal tracking control for chaotic systems with convergence proof: 2013年; In this paper, an optimal tracking control scheme is proposed for a class of discrete-time chaotic systems using the approximation-error-based adaptive dynamic programming (ADP) algorithm. Via the system transformation, the optimal tracking problem is transformed into an optimal regulation problem, and then the novel optimal tracking control method is proposed. It is shown that for the iterative ADP algorithm with finite approximation error, the iterative performance index functions can converge to a finite neighborhood of the greatest lower bound of all performance index functions under some convergence conditions. Two examples are given to demonstrate the validity of the proposed optimal tracking control scheme for chaotic systems.; 宋睿卓肖文栋孙长银魏庆来; 关键词：最优跟踪控制收敛性证明 ADP

Optimal Constrained Self-learning Battery Sequential Management in Microgrid Via Adaptive Dynamic Programming被引量：13: 2017年; This paper concerns a novel optimal self-learning battery sequential control scheme for smart home energy systems.The main idea is to use the adaptive dynamic programming(ADP) technique to obtain the optimal battery sequential control iteratively. First, the battery energy management system model is established, where the power efficiency of the battery is considered. Next, considering the power constraints of the battery, a new non-quadratic form performance index function is established, which guarantees that the value of the iterative control law cannot exceed the maximum charging/discharging power of the battery to extend the service life of the battery.Then, the convergence properties of the iterative ADP algorithm are analyzed, which guarantees that the iterative value function and the iterative control law both reach the optimums. Finally,simulation and comparison results are given to illustrate the performance of the presented method.; Qinglai WeiDerong LiuYu LiuRuizhuo Song

全选清除导出

共1页<1>

国家自然科学基金(61304079)