<<
>>

Data Pre-processing and Clearance

The unnecessary data and attributes in posts should be pre-processed before analysis. However, before the pre-process, the repetitive data were cleaned in the posts and the posts containing slang/inappropriate, hate speech was detected and cleaned.

1t was seen that the users made a lot of posts in the mentioned period, relevant or not, in order to stand out in the trending topics. Word groups included in these posts were determined, and these posts were excluded from the data set. 1n addition, it was observed that there were posts that contain the keywords determined in the data set process and might constitute a crime. Rules were created for these posts, and relevant shares were excluded from the data set within the framework of these rules. To make analysis easier and simpler, unnecessary words and characters were removed from the posts. Accordingly, the following operations were performed on the data set.

• Uppercase letters were converted into lowercase letters in order to prevent differences that may arise as a result of the analysis.

• Different characters (special characters/numbers, etc.) contained in the posts were removed from the data set.

• URL information in the post, if any, was removed.

• Stop words that are commonly used in every language were removed from the data set. NLTK (2020) library was used to exclude these words from the data set.

13.6

<< | >>
Source: Açıkgoz B., Acar İ.A.. Pandemnomics: The Pandemic's Lasting Economic Effects. Singapore: Springer,2022. — 290 p.. 2022
More economic literature on Economics.Studio

More on the topic Data Pre-processing and Clearance: