Data cleaning: Text

Cleaning categorial data

One Hot Encoding

Cleaning text data

Bag-of-words

N-grams

Introduction

We can add start and end of sentence markets. * and STOP

Generally remove punctuation

Feature hashing