您的位置: 专家智库 > >

国家自然科学基金(61103045)

作品数:2 被引量:3H指数:1
相关作者:傅启明刘全李瑾杨旭东荆玲更多>>
相关机构:苏州大学吉林大学南京大学更多>>
发文基金:国家自然科学基金江苏省高校自然科学研究项目江苏省自然科学基金更多>>
相关领域:自动化与计算机技术更多>>

文献类型

  • 2篇中文期刊文章

领域

  • 2篇自动化与计算...

主题

  • 1篇调度
  • 1篇智能调度
  • 1篇强化学习方法
  • 1篇可扩展
  • 1篇可扩展性
  • 1篇扩展性
  • 1篇REINFO...
  • 1篇LARGE
  • 1篇并行计算
  • 1篇PARALL...
  • 1篇SCALAB...
  • 1篇SCHEDU...

机构

  • 1篇南京大学
  • 1篇吉林大学
  • 1篇苏州大学

作者

  • 1篇李娇
  • 1篇荆玲
  • 1篇杨旭东
  • 1篇李瑾
  • 1篇刘全
  • 1篇傅启明

传媒

  • 1篇计算机研究与...
  • 1篇Fronti...

年份

  • 1篇2013
  • 1篇2012
2 条 记 录,以下是 1-2
排序方式:
A parallel scheduling algorithm for reinforcement learning in large state space
2012年
The main challenge in the area of reinforcement learning is scaling up to larger and more complex problems. Aiming at the scaling problem of reinforcement learning, a scalable reinforcement learning method, DCS-SRL, is proposed on the basis of divide-and-conquer strategy, and its convergence is proved. In this method, the learning problem in large state space or continuous state space is decomposed into multiple smaller subproblems. Given a specific learning algorithm, each subproblem can be solved independently with limited available resources. In the end, component solutions can be recombined to obtain the desired result. To ad- dress the question of prioritizing subproblems in the scheduler, a weighted priority scheduling algorithm is proposed. This scheduling algorithm ensures that computation is focused on regions of the problem space which are expected to be maximally productive. To expedite the learning process, a new parallel method, called DCS-SPRL, is derived from combining DCS-SRL with a parallel scheduling architecture. In the DCS-SPRL method, the subproblems will be distributed among processors that have the capacity to work in parallel. The experimental results show that learning based on DCS-SPRL has fast convergence speed and good scalability.
Quan LIUXudong YANGLing JINGJin LIJiao LI
关键词:SCALABILITY
一种基于智能调度的可扩展并行强化学习方法被引量:3
2013年
针对强化学习在大状态空间或连续状态空间中存在的"维数灾"问题,提出一种基于智能调度的可扩展并行强化学习方法——IS-SRL,并从理论上进行分析,证明其收敛性.该方法采用分而治之策略对大状态空间进行分块,使得每个分块能够调入内存独立学习.在每个分块学习了一个周期之后交换到外存上,调入下一个分块继续学习.分块之间在换入换出的过程中交换信息,以使整个学习任务收敛到最优解.同时针对各分块之间的学习顺序会显著影响学习效率的问题,提出了一种新颖的智能调度算法,该算法利用强化学习值函数更新顺序的分布特点,基于多种调度策略加权优先级的思想,把学习集中在能产生最大效益的子问题空间,保障了IS-SRL方法的学习效率.在上述调度算法中融入并行调度框架,利用多Agent同时学习,得到了IS-SRL方法的并行版本——IS-SPRL方法.实验结果表明,IS-SPRL方法具有较快的收敛速度和较好的扩展性能.
刘全傅启明杨旭东荆玲李瑾李娇
关键词:并行计算可扩展性智能调度
共1页<1>
聚类工具0