lundi 31 août 2015

Force wordcloud python module to include all words

I'm using the wordcloud module in Python by Andreas Mueller to visualize results of a survey my students will complete. Brilliant module, very nice pictures, however I have trouble making it recognize all words, even when setting stopwords=None and ranks_only=True. The survey responses are between one and three words long and may contain hyphens.

Here is an example. First I install dependencies in my Jupyter notebook:

import matplotlib.pyplot as plt
%matplotlib inline
from wordcloud import WordCloud
from scipy.misc import imread

Then suppose I put all the responses into a string:

words = "do do do do do do do do do do re re re re re mi mi fa fa fa fa fa fa fa fa fa fa-so fa-so fa-so fa-so fa-so so la ti do"

Then I execute the plot:

wordcloud = WordCloud(ranks_only = True,stopwords=None).generate(words)
plt.imshow(wordcloud)
plt.axis('off')
plt.show()

But for some reason it ignores "do" and "fa-so" despite their high frequency.

Any tips? Besides "don't use a word cloud". It is a silly survey and it invites a silly visualization. Thanks.

Update

Still unable to include hyphened words (e.g. "fa-so"), they just drop out.



via Chebli Mohamed

Aucun commentaire:

Enregistrer un commentaire