Organizing Text for Analysis

2 Kudos

See how to:

Text Explorer is useful for exploring unstructured text, such as survey free-response fields or incident reports, to understand its meaning. This is often an iterative process, where you alternate between curating and analyzing a list of terms and phrases.

See how to:

Use Text Exlorer input menu to start exploring text
Understand data used in Hotel Review case study
Understand and refine Term and Phrase Lists and Word Cloud
Use Data Filter to explore Word Cloud results
Use JMP Pro for Sentiment Analysis and Term Selection
Use Multivariate Embedding to graph text results into two or three groups to understand related results

Kemal @kemal_oflus, Olivia @O_Lippincott and Ruth @ruthhummel answered questions after the June 2023 demo.

Q: Have you considered adding a feature for "coded" text/cryptography in order to help recognize and analyze patterns in coded text?

A: Some of this can be addressed by using custom Regex. There is an option for this on that initial launch window. There, you can teach JMP how to identify many patterns and what to do when it finds them.

Q: Can you do a word cloud for the phrases instead of the words?

A: The word cloud is using the term list to generate the word cloud. If you add phrases to the term list like Kemal did in the example, the word cloud will also show the phrases.

Q: Do you find that you change the default text explorer options (maximum characters per word, stemming, maximum words per phrase)? I usually leave these alone and wonder if it is important to change them.

A: In most cases I leave them alone except for minimum characters per word. Subject matter experts may have inputs here. If you want to customize a Regular Expression and you have some unique terms, you can customize See video below.

(view in My Videos)

Q: Is Sentiment Analysis only in JMP Pro?

A: Yes, see more about Sentiment Analysis using JMP Pro.

Q: Do you have any tips with working with custom Regular Expressions with JMP and text analysis. I find that I spend a lot of time trying to do this and get limited results.

A: We offer a course on Text Analysis that you might find useful. It includes various examples of how to use Regex. There is a JMP blog post about Regex for cleaning data that will be posting in a JMP Community Blog.

Q: t-SNE plot: what does x-y #s mean?

A: For that I would recommend watching a video on multivariate method clustering or on teaching clustering. In this case, we had 25 different columns and reduced that multi-dimensional space down to 2. Think of this as similar to principal components. You can choose how many outputs (or components) you want.

Multivariate Output Options.JPG

Q: Sometimes there is red text in the phrase list. What does that signify?

A: It identifies a default color in our library.

Phrase Colors.JPG

Questions answered at a previous demo on this topic:

Q: Do you have to pre-configure the Word cloud to ignore super common words, like "the"?

A: There are a list of "stop words" that are super common words, like "the", that JMP uses to exclude from the analysis. See documentation on using stop words and related topics.

Q: Can we import delimited txt files and analyze columns individually?

A: JMP can import delimited text files into a JMP data table. You could analyze multiple text columns individually which would give a separate output for each text column.

Q: In his example, if red color shows the most frequent word, why isn't the biggest word "red"

A: The average score column in the data table is the basis for coloring in his example. The size is the most frequent word/term. In this example, Kemal specified that this word cloud coloring should be controlled by the Average score column from the data table. So Average_score doesn’t come from JMP, but from the data table. See more about Word Cloud options.

(view in My Videos)

Q: What are tokens?

A: Tokens are Words/Terms plus Phrases/Cases or a combination of those . In this case I had 15792 Terms, 59360 Cases (or Phrases) and about 17 of these per row (Total Tokens per Case). See more details.

The Tokenizing stage converts text to lowercase, then applies a ‘Tokenizing’ method (either Basic Words or Regex) to group characters into tokens, and then it recodes tokens based on specified recode definitions. Note that recoding occurs before stemming and recode operations are processed internally in one pass regardless of the order that they are specified in the report window. See more about the Text Processing Steps.

Tokens Tokens

Q: Can you make a Phrase Cloud rather than a Word Cloud?

A: We contacted JMP Developer, Xan Gregg @XanGregg, after the live webinar. He suggested that the Word Cloud might more accurately be called a Term Cloud. It shows all the words in the Term list. I suppose you could add a lot of Phrases as Terms and remove words by making them stop words, and then you would get a cloud of Phrases. If you are interested in requesting Phrase Clouds, feel free to post request and rationale to the JMP Software Wish List.

Q: I am really interested in the "document clustering" you mentioned earlier. Where can I find information on this where among 100s of text documents and I want to find which ones are in a similar group?

A: You can use JMP Pro for Latent Class Analysis and Latent Semantic Analysis.

Latent Analysis Options

See this brief overview of how to use Latent Analyses.

(view in My Videos)

A: What are documents and how do you import them into JMP?

Q: Each row is considered a document and you import it from any file that captures it in rows. For example, you can import the text from Excel. In this example, the reviews were captured as rows in an Excel file.

Resources

Overview of Text Explorer (Hint: Work through the documentation pages as a quasi-tutorial.)
Documentation on Predictor Screening
Video on Using Text Explorer to Extend Analysis

gail_massari · ‎06-13-2022

After the session, JMP user and attendee Mark Anawis @mark_anawis suggested this answer to a question about retrieving 2 words (Phrase) instead of 1 for the Term list. I think you may be able to do this by selecting Tokenizing ‘Regex’ and checking ‘Customize Regex’. This brings up the Text Explorer Regular Expression Editor for the Text Column. You can potentially modify the Regex function to retrieve 2 words instead of 1.

Recommended Articles