{"id":1172540,"date":"2021-10-16T08:01:10","date_gmt":"2021-10-16T12:01:10","guid":{"rendered":"https:\/\/www.analyticsvidhya.com\/?p=85362"},"modified":"2021-10-16T08:01:10","modified_gmt":"2021-10-16T12:01:10","slug":"cleaning-and-pre-processing-textual-data-with-neattext-library","status":"publish","type":"station","link":"https:\/\/platodata.io\/plato-data\/cleaning-and-pre-processing-textual-data-with-neattext-library\/","title":{"rendered":"Cleaning and Pre-processing textual data with NeatText library"},"content":{"rendered":"
\n
Unstructured text data can be a problem while solving NLP problems. There is a need to pre-process any unstructured text data in order for us to build an effective NLP model. Hence pre-processing textual data is an important step while building any NLP model. Converting text into numbers is important as the machine learning models take only numbers as inputs. Therefore converting string objects(text) into \u2018int\u2019 objects is necessary. There are many ways to pre-process text. One way is to hard code every step and processes the text data through that code. Another way is to use any Natural Language Processing package that does the work for us using simple commands. One such package is NeatText<\/a>.<\/p>\n NeatText is a simple Natural Language Processing package for cleaning text data and pre-processing text data. It can be used to clean sentences, extract emails, phone numbers, weblinks, and emojis from sentences. It can also be used to set up text pre-processing pipelines.<\/span><\/p>\n This library is intended to solve the following problems :<\/p>\n In this article, we shall explore the different components and functionalities of this package using examples. First, let us see the different components in this package.<\/p>\n This library offers four components also called objects. They are:<\/p>\nTable of Contents<\/h3>\n
\n
What is NeatText<\/h2>\n
\n
Components of NeatText<\/h2>\n
\n