We can use the DeleteStopwords command to remove common words like “the”, “is”, etc.I first convert the string d0 to a list of words with TextWords, then convert it to lower case with ToLowerCase. Not all the words in the file carry any useful information for a word cloud. d0 = StringReplace Delete uninformative words in the text I examined the initial word cloud to identify plural and singular terms that should be combined, then I add them to command below, which merges replaces plurals in the text with their singular forms. I am sure there are more elegant ways to do this with Mathematica, but this works and for simple applications, does not require too much effort. We load and store the text into an item we'll call d0 *)ĭ0 = Import Replace common plurals in the textĮdit the list of important plural words to insure the importance of a term is not diminished because it is sometimes pluralized. (* you will have to replace the path to the text file using The text into a string variable called d0, which we can use to manipulate the text. I copied and pasted the text into an editor and saved the file. I created the text file using the scanned images of the textbook and the Mac OSX tool pdfPenPro. I use the following code to make a word cloud and then adapts some of the scrubbing functions to better clean the specific text that I am processing. The overall goal is to show students a use of computational thinking (and computing) that helps in the analysis of something other than mathematical computations.īefore making the word cloud we have to scrub the text to remove non-words such as numbers, and to remove words that carry little information about the topic such as “the” – Mathematica calls these terms stop words. The example used below includes the text from an introductory chapter on Plate Tectonics, that is assigned early in the course. as well as combine different forms of a word such as plurals and singular versions – at least for the important terms. To meet that assumption, we have to remove common words such as “the”, “and”, “a”, etc. The underlying assumption is that the more often a word is used, the more important it is to the topic. One of the examples in my introductory classes is to create a word cloud from a text file containing the text of one of their reading assignments. A word cloud is a graphical representation of the frequency with which words occur in a section of text.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |