I'm using the wordcloud module in Python by Andreas Mueller to visualize results of a survey my students will complete. Brilliant module, very nice pictures, however I have trouble making it recognize all words, even when setting stopwords=None
and ranks_only=True
. The survey responses are between one and three words long and may contain hyphens.
Here is an example. First I install dependencies in my Jupyter notebook:
import matplotlib.pyplot as plt
%matplotlib inline
from wordcloud import WordCloud
from scipy.misc import imread
Then suppose I put all the responses into a string:
words = "do do do do do do do do do do re re re re re mi mi fa fa fa fa fa fa fa fa fa fa-so fa-so fa-so fa-so fa-so so la ti do"
Then I execute the plot:
wordcloud = WordCloud(ranks_only = True,stopwords=None).generate(words)
plt.imshow(wordcloud)
plt.axis('off')
plt.show()
But for some reason it ignores "do" and "fa-so" despite their high frequency.
Any tips? Besides "don't use a word cloud". It is a silly survey and it invites a silly visualization. Thanks.
Update
Still unable to include hyphened words (e.g. "fa-so"), they just drop out.
via Chebli Mohamed
Aucun commentaire:
Enregistrer un commentaire