{"id":277,"date":"2025-12-22T13:56:20","date_gmt":"2025-12-22T13:56:20","guid":{"rendered":"https:\/\/intention.put.poznan.pl\/?p=277"},"modified":"2025-12-22T16:49:10","modified_gmt":"2025-12-22T16:49:10","slug":"one-policy-to-run-them-all-an-end-to-end-learningapproach-to-multi-embodiment-locomotion","status":"publish","type":"post","link":"https:\/\/intention.put.poznan.pl\/?p=277","title":{"rendered":"One Policy to Run Them All: an End-to-end LearningApproach to Multi-Embodiment Locomotion"},"content":{"rendered":"<figure class=\"wp-block-post-featured-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1933\" height=\"1433\" src=\"https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/robots.png\" class=\"attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"\" style=\"object-fit:cover;\" srcset=\"https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/robots.png 1933w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/robots-300x222.png 300w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/robots-1024x759.png 1024w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/robots-768x569.png 768w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/robots-1536x1139.png 1536w\" sizes=\"auto, (max-width: 1933px) 100vw, 1933px\" \/><\/figure>\n\n\n<p class=\"wp-block-paragraph\">Do you need a new locomotion policy for every new robot? No! We can train a single general locomotion policy for any legged robot embodiment and morphology!<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Unified Robot Morphology Architecture<\/h4>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"325\" src=\"https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/architecture-1024x325.png\" alt=\"\" class=\"wp-image-280\" srcset=\"https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/architecture-1024x325.png 1024w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/architecture-300x95.png 300w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/architecture-768x244.png 768w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/architecture-1536x487.png 1536w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/architecture-2048x649.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Our Unified Robot Morphology Architecture can handle any robot morphology. To achieve this, we design our network using  three main components:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A <strong>simple attention encoder <\/strong>for joint and feet information<\/li>\n\n\n\n<li>A <strong>core network<\/strong>, which learns the meta locomotion gait<\/li>\n\n\n\n<li>A <strong>universal decoder<\/strong>, generation actions for each joint<\/li>\n<\/ul>\n\n\n\n<h5 class=\"wp-block-heading\">The URMA encoder<\/h5>\n\n\n\n<p class=\"wp-block-paragraph\">To handle observations of any morphology, URMA splits observations into robot-specific and general parts. Robot-specific observations are joint (and foot) observations. They have the same structure but vary in their number depending on the robot. We can only use fixed-length vectors in neural networks, so we need a mechanism that can take any joint observation and route it into a latent vector that holds the information of all joints. Similar joints from different robots should map to similar regions in the latent vector.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">URMA uses a simple attention encoder where joint description vectors act as the keys and the joint observations as the values in the attention mechanism:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"463\" src=\"https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/website_architecture_1-1024x463.png\" alt=\"\" class=\"wp-image-281\" srcset=\"https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/website_architecture_1-1024x463.png 1024w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/website_architecture_1-300x136.png 300w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/website_architecture_1-768x347.png 768w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/website_architecture_1-1536x695.png 1536w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/website_architecture_1-2048x926.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The same attention encoding is used for the foot observations <\/p>\n\n\n\n<h5 class=\"wp-block-heading\">The Core network<\/h5>\n\n\n\n<p class=\"wp-block-paragraph\">The resulting joint and feet latent vectors are concatenated with the general observations and passed to the policy&#8217;s core network. The output of the network is the action latent network that represents a meta locomotion action, and it&#8217;s ready to be decoded into an action for the specific embodiment!<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"324\" src=\"https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/website_architecture_2-1024x324.png\" alt=\"\" class=\"wp-image-282\" srcset=\"https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/website_architecture_2-1024x324.png 1024w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/website_architecture_2-300x95.png 300w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/website_architecture_2-768x243.png 768w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/website_architecture_2-1536x486.png 1536w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/website_architecture_2-2048x648.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h5 class=\"wp-block-heading\">The Universal decoder<\/h5>\n\n\n\n<p class=\"wp-block-paragraph\">Finally, we use our universal morphology decoder, which takes the output of the core network and pairs it with the batch of joint descriptions and single joint latents to produce the final action for every given joint. <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"405\" src=\"https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/website_architecture_3-1024x405.png\" alt=\"\" class=\"wp-image-283\" srcset=\"https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/website_architecture_3-1024x405.png 1024w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/website_architecture_3-300x119.png 300w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/website_architecture_3-768x304.png 768w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/website_architecture_3-1536x607.png 1536w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/website_architecture_3-2048x810.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Results<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Our URMA policy outperforms classic multi-task RL approaches, showing strong robustness and zero-shot capabilities, both in simulation and in the real world!<\/p>\n\n\n\n<h5 class=\"wp-block-heading\">Results in simulation<\/h5>\n\n\n\n<p class=\"wp-block-paragraph\">We compared our approach against the following baseline:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Single robot baseline<\/strong>: we train a separate network from scratch for each robot<\/li>\n\n\n\n<li><strong>Padding baseline<\/strong>: this approach uses a single network, padding the input and the output of the network to support the maximum embodiment input\/output size<\/li>\n\n\n\n<li><strong>Multi-head baseline<\/strong>: The typical multitask learning architecture, with encoders and decoders learned per embodiment, but using a shared core network<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"564\" src=\"https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/urma_results-1-1024x564.jpeg\" alt=\"\" class=\"wp-image-284\" srcset=\"https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/urma_results-1-1024x564.jpeg 1024w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/urma_results-1-300x165.jpeg 300w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/urma_results-1-768x423.jpeg 768w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/urma_results-1-1536x846.jpeg 1536w, https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/urma_results-1-2048x1128.jpeg 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">During training, the URMA architecture achieves higher final performances w.r.t. all other baselines. However, URMA shines in the case of zero-shot deployment and adaptation:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If the robot embodiment is similar to other embodiments in the training (Unitree A1), the zero-shot transfer works seamlessly<\/li>\n\n\n\n<li>URMA zero-shot transfer works well even if we remove some observations, for example, if we train with foot contact information and we hide it during deployment<\/li>\n\n\n\n<li>If the robot embodiment is out of distribution, e.g., we try to deploy in our MAB Silver badger robot, we get a bigger performance drop, but we still outperform the baselines in terms of performance.<\/li>\n\n\n\n<li>If we perform fine-tuning in the MAB Silver Badger robot, we achieve better final performance w.r.t. all the baselines.<\/li>\n<\/ul>\n\n\n\n<h5 class=\"wp-block-heading\">Results in the real world<\/h5>\n\n\n\n<p class=\"wp-block-paragraph\">URMA can be easily deployed in the real world! After simulation training, the policy is deployed on two quadruped robots from the training set in the real world. Extensive domain randomization during training allows the policy to transfer directly to real robots without any further adaptation.<\/p>\n\n\n\n<figure class=\"wp-block-video\"><video height=\"1080\" style=\"aspect-ratio: 1920 \/ 1080;\" width=\"1920\" controls src=\"https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/real_world_seen_robots.mp4\"><\/video><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">However, thanks to the wide variety of robots in the training set, the randomization of their properties, and the morphology-agnostic URMA architecture, the policy can also generalize to new robots never seen during the training process!<\/p>\n\n\n\n<figure class=\"wp-block-video\"><video height=\"1080\" style=\"aspect-ratio: 1920 \/ 1080;\" width=\"1920\" controls src=\"https:\/\/intention.put.poznan.pl\/wp-content\/uploads\/2025\/12\/real_world_unseen_robots.mp4\"><\/video><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">More information<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">If you are interested in URMA, you can take a look at<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Our <strong>CORL 2024<\/strong> paper: <a href=\"https:\/\/openreview.net\/pdf?id=PbQOZntuXO\">https:\/\/openreview.net\/pdf?id=PbQOZntuXO<\/a><\/li>\n\n\n\n<li>The paper website: <a href=\"https:\/\/nico-bohlinger.github.io\/one_policy_to_run_them_all_website\/\">https:\/\/nico-bohlinger.github.io\/one_policy_to_run_them_all_website\/<\/a><\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">The project website includes a cool demo of URMA directly embedded in your browser!<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Do you need a new locomotion policy for every new robot? No! We can train a single general locomotion policy for any legged robot embodiment and morphology! Unified Robot Morphology Architecture Our Unified Robot Morphology Architecture can handle any robot morphology. To achieve this, we design our network using three main components: The URMA encoder [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":279,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-277","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/intention.put.poznan.pl\/index.php?rest_route=\/wp\/v2\/posts\/277","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/intention.put.poznan.pl\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/intention.put.poznan.pl\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/intention.put.poznan.pl\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/intention.put.poznan.pl\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=277"}],"version-history":[{"count":4,"href":"https:\/\/intention.put.poznan.pl\/index.php?rest_route=\/wp\/v2\/posts\/277\/revisions"}],"predecessor-version":[{"id":314,"href":"https:\/\/intention.put.poznan.pl\/index.php?rest_route=\/wp\/v2\/posts\/277\/revisions\/314"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/intention.put.poznan.pl\/index.php?rest_route=\/wp\/v2\/media\/279"}],"wp:attachment":[{"href":"https:\/\/intention.put.poznan.pl\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=277"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/intention.put.poznan.pl\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=277"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/intention.put.poznan.pl\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=277"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}