{"id":2863485,"date":"2023-09-04T10:00:03","date_gmt":"2023-09-04T14:00:03","guid":{"rendered":"https:\/\/wordpress-1016567-4521551.cloudwaysapps.com\/plato-data\/react-reasoning-and-acting-augments-llms-with-tools-kdnuggets\/"},"modified":"2023-09-04T10:00:03","modified_gmt":"2023-09-04T14:00:03","slug":"react-reasoning-and-acting-augments-llms-with-tools-kdnuggets","status":"publish","type":"station","link":"https:\/\/platodata.io\/plato-data\/react-reasoning-and-acting-augments-llms-with-tools-kdnuggets\/","title":{"rendered":"ReAct, Reasoning and Acting augments LLMs with Tools! – KDnuggets"},"content":{"rendered":"
Short for Reasoning and Acting, this paper<\/a> introduces a new concept that improves the performance of LLMs and also provides us with more explainability and interpretability.<\/p>\n The goal of AGI could be one of the most important goals for human civilization to achieve. Imagine creating artificial intelligence that could generalize to many problems. There are many interpretations of what an AGI is, and when do we say that we have achieved it?<\/p>\n The most promising method for AGI in the last decades was the reinforcement learning path, more specifically what DeepMind was able to achieve hard tasks, AlphaGo, AlphaStar and so many breakthroughs\u2026<\/p>\n However, ReAct outperforms imitation and reinforcement learning methods by an absolute success rate of 34% and 10% respectively, while being prompted with only one or two in-context examples.<\/p>\n With this kind of result (of course, provided there is no data leakage and we can trust the evaluation methods provided in the paper), we can no longer ignore LLMs\u2019 potential to reason and divide complex tasks into logical steps.<\/p>\n This paper starts with the idea that LLMs so far are impressive in language understanding, they have been used to generate CoT (Chain of thought) to solve some problems, and they were also used for acting and plan generation.<\/p>\n Although these two have been studied separately, the paper aims to combine both reasoning and acting in an interleaved manner to enhance LLM’s performance.<\/p>\n The reason behind this idea is that if you think about how you, as a human, behave in order to execute some task.<\/p>\n The first step is that you\u2019ll use \u201cinner Speech\u201d or you\u2019ll write down or communicate with yourself somehow, saying \u201cHow do I execute task X? to do task X I need to first do step 1 and then do step2 and so on\u201d<\/p>\n More concretely, if you were to cook up a dish in the kitchen, you could ReAct something like this:<\/p>\n \u201cNow that everything is cut, I should heat up the pot of water\u201d), to handle exceptions or adjust the plan according to the situation (\u201cI don\u2019t have salt, so let me use soy sauce and pepper instead\u201d), and to realize when external information is needed (\u201chow do I prepare dough? Let me search on the Internet\u201d).<\/p>\n You can also act (open a cookbook to read the recipe, open the fridge, check ingredients) to support the reasoning and answer questions (\u201cWhat dish can I make right now?\u201d).<\/p>\n This combination of both reasoning and acting is what makes humans learn and achieve tasks even under previously unseen circumstances or when faced with information uncertainties.<\/p>\n Previous works demonstrated the capabilities of LLMs to reason, for example, Chain of Thought Prompting demonstrated that the model could come up with plans to answer questions in arithmetic, common sense, and symbolic reasoning.<\/p>\n However, the model here is still a \u201cstatic black box\u201d because it uses its internal language representation to answer these questions, and this representation may not always be accurate or up-to-date which leads to fact hallucination (coming with facts from its own imagination) or error propagation (one error in the chain of thoughts propagates to a wrong answer).<\/p>\n Without the ability to take some sort of action and update its knowledge, the model is limited.<\/p>\n There have also been studies that employed LLMs to do actions based on language, these studies usually take in multimodal inputs (audio, text, and images), convert them to text, use the model to generate in-domain actions, and then use a controller to do these actions.<\/p>\n Without the ability to plan some steps and reason about what to do, the model will simply output the wrong actions.<\/p>\n The proposal of this paper is to combine both methods mentioned above. ReAct prompts LLMs to generate both verbal reasoning traces and actions pertaining to a task in an interleaved manner, which allows the model to perform dynamic reasoning to create, maintain, and adjust high-level plans for acting (reason to act), while also interacting with external environments (e.g., Wikipedia) to incorporate additional information into reasoning (act to reason).<\/p>\n This is shown in the figure below:<\/p>\n <\/p>\n So in order to make the reasoning prompting better, they design an action space, which means three actions that the model is allowed to use when answering questions.<\/p>\n This is done through a Wikipedia API that provides the following:<\/p>\n Something that is not usual here is that there are way more powerful information retrieval tools than the ones mentioned above.<\/p>\n The goal behind this is to simulate human behavior and how a human would interact with Wikipedia and reason to find an answer.<\/p>\n In addition to the provided tools, we need to properly prompt the LLM, to provide reasoning and properly chain actions.<\/p>\n To this end, they use a combination of thoughts that decompose a question like (\u201cI need to search x, find y, then find z\u201d), extract information from Wikipedia observations (\u201cx was started in 1844\u201d, \u201cThe paragraph does not tell x\u201d), perform common sense (\u201cx is not y, so z must instead be\u2026\u201d) or arithmetic reasoning (\u201c1844 < 1989\u201d), guide search reformulation (\u201cmaybe I can search\/lookup x instead\u201d), and synthesize the final answer (\u201c\u2026so the answer is x\u201d)<\/p>\n Finally, the results look something like this:<\/p>\n <\/p>\n The datasets chosen for the evaluation are the following:<\/p>\n HotPotQA<\/strong><\/a>: is a question-answering dataset that requires reasoning over one or two Wikipedia pages.<\/p>\n FEVER<\/strong><\/a>: a fact verification benchmark where each claim is annotated SUPPORTS, REFUTES, or NOT ENOUGH INFO, based on whether there exists a Wikipedia passage to verify the claim.<\/p>\n ALFWorld<\/strong><\/a>: Text Based game that includes 6 types of tasks that the agent needs to perform to achieve a high-level goal.<\/p>\n An example would be \u201cexamine paper under desk lamp\u201d by navigating and interacting with a simulated household via text actions (e.g. go to coffee table 1, take paper 2, use desk lamp 1)<\/p>\n WebShop<\/strong><\/a>: an online shopping website environment with 1.18M real-world products and 12k human instructions with much more variety and complexity.<\/p>\n It requires an agent to purchase a product based on user instructions. For example \u201cI am looking for a nightstand with drawers. It should have a nickel finish, and be priced lower than $140\u201d, the agent needs to achieve this through web interactions.<\/p>\n So the results show that ReAct always outperforms Act<\/strong>, which goes to show that the reasoning part is extremely important to enhance the actions.<\/p>\n On the other hand, ReAct outperforms CoT on Fever (60.9 vs. 56.3) and slightly lags behind CoT on HotpotQA (27.4 vs. 29.4). So for the FEVER dataset, acting to get updated knowledge is showing to give the needed boost to make the right SUPPORT or REFUTE decision.<\/p>\n When comparing CoT vs ReAct on HotpotQA and why the performance is comparable, these are the key observations found:<\/p>\n <\/p>\n <\/p>\n I hope this article helped you to understand this paper. You can check it out here https:\/\/arxiv.org\/pdf\/2210.03629.pdf<\/a><\/p>\n Implementations of ReAct exist already here<\/a> and here<\/a>.<\/p>\n
Difference between Reason, Act and ReAct (Photo taken from the paper)<\/span> <\/p>\n\n
How ReAct works and leads to better results (Photo taken from the paper)<\/span> <\/p>\n\n
ReAct and CoT results on different datasets (Photo taken from the paper)<\/span><\/p>\n