preprocess text files to include only readable and text words

Published on Tue Mar 01 2022

for large corpus cleaning