<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Paper Publication | Takato Horii | Osaka University</title><link>https://www.takatohorii.jp/en/tags/paper-publication/</link><atom:link href="https://www.takatohorii.jp/en/tags/paper-publication/index.xml" rel="self" type="application/rss+xml"/><description>Paper Publication</description><generator>HugoBlox Kit (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Thu, 21 May 2026 00:00:00 +0000</lastBuildDate><image><url>https://www.takatohorii.jp/media/icon_hu_da05098ef60dc2e7.png</url><title>Paper Publication</title><link>https://www.takatohorii.jp/en/tags/paper-publication/</link></image><item><title>New Paper Published in IEEE Access: Distance-Aware World Model-based Reinforcement Learning</title><link>https://www.takatohorii.jp/en/blog/ieee-access-distance-aware-world-model/</link><pubDate>Thu, 21 May 2026 00:00:00 +0000</pubDate><guid>https://www.takatohorii.jp/en/blog/ieee-access-distance-aware-world-model/</guid><description>&lt;p&gt;Our paper &amp;ldquo;Distance-Aware World Model-based Reinforcement Learning for Mobile Manipulation Behaviors,&amp;rdquo; authored by Xiaoxu Feng (doctoral student) and Takato Horii, has been accepted and published as Early Access (open access, CC-BY) in &lt;em&gt;IEEE Access&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;This work tackles end-to-end learning of mobile manipulation by jointly addressing &lt;strong&gt;embodiment selection&lt;/strong&gt; (whether to move the base or use the arm) and &lt;strong&gt;motion planning&lt;/strong&gt; within a single world model-based reinforcement learning framework. To overcome the limitations of similarity-based rewards in latent space, we explicitly incorporate 3D spatial information into the world model and derive a &lt;strong&gt;distance-aware reward&lt;/strong&gt; for policy training. At the high level, a reachability-based reward enables rational switching between locomotion and manipulation. Extensive simulation experiments with both fixed and randomized target settings show that our approach achieves near-perfect motion success rates and substantially more rational arm activations than baseline composite-reward formulations.&lt;/p&gt;
&lt;p&gt;This work was supported by JST Moonshot R&amp;amp;D (JPMJMS2011).&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Xiaoxu Feng, Takato Horii (Graduate School of Engineering Science, The University of Osaka)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Journal:&lt;/strong&gt; &lt;em&gt;IEEE Access&lt;/em&gt; (Early Access, May 21, 2026)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DOI:&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Paper:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>