iphone: Force wordcloud python module to include all words

lundi 31 août 2015

Force wordcloud python module to include all words

I'm using the wordcloud module in Python by Andreas Mueller to visualize results of a survey my students will complete. Brilliant module, very nice pictures, however I have trouble making it recognize all words, even when setting stopwords=None and ranks_only=True. The survey responses are between one and three words long and may contain hyphens.

Here is an example. First I install dependencies in my Jupyter notebook:

import matplotlib.pyplot as plt
%matplotlib inline
from wordcloud import WordCloud
from scipy.misc import imread

Then suppose I put all the responses into a string:

words = "do do do do do do do do do do re re re re re mi mi fa fa fa fa fa fa fa fa fa fa-so fa-so fa-so fa-so fa-so so la ti do"

Then I execute the plot:

wordcloud = WordCloud(ranks_only = True,stopwords=None).generate(words)
plt.imshow(wordcloud)
plt.axis('off')
plt.show()

But for some reason it ignores "do" and "fa-so" despite their high frequency.

Any tips? Besides "don't use a word cloud". It is a silly survey and it invites a silly visualization. Thanks.

iphone

lundi 31 août 2015

Force wordcloud python module to include all words

Aucun commentaire:

Enregistrer un commentaire