This website uses Cookies. Click Accept to agree to our website's cookie use as described in our Privacy Policy. Click Preferences to customize your cookie settings.
Practice JMP using these webinar videos and resources. We hold live Mastering JMP Zoom webinars with Q&A most Fridays at 2 pm US Eastern Time. See the list and register. Local-language live Zoom webinars occur in the UK, Western Europe and Asia. See your country jmp.com/mastering site.
Created:
Jun 13, 2022 12:50 PM
| Last Modified: Oct 24, 2023 03:44 PM
TextAnalyticsJournal.jrn
See how to:
Organizing Text for Analysis
Text Explorer is useful for exploring unstructured text, such as survey free-response fields or incident reports, to understand its meaning. This is often an iterative process, where you alternate between curating and analyzing a list of terms and phrases.
See how to:
Use Text Exlorer input menu to start exploring text
Understand data used in Hotel Review case study
Understand and refine Term and Phrase Lists and Word Cloud
Use Data Filter to explore Word Cloud results
Use JMP Pro for Sentiment Analysis and Term Selection
Use Multivariate Embedding to graph text results into two or three groups to understand related results
Q: Have you considered adding a feature for "coded" text/cryptography in order to help recognize and analyze patterns in coded text?
A: Some of this can be addressed by using custom Regex. There is an option for this on that initial launch window. There, you can teach JMP how to identify many patterns and what to do when it finds them.
Q: Can you do a word cloud for the phrases instead of the words?
A: The word cloud is using the term list to generate the word cloud. If you add phrases to the term list like Kemal did in the example, the word cloud will also show the phrases.
Q: Do you find that you change the default text explorer options (maximum characters per word, stemming, maximum words per phrase)? I usually leave these alone and wonder if it is important to change them.
A: In most cases I leave them alone except for minimum characters per word. Subject matter experts may have inputs here. If you want to customize a Regular Expression and you have some unique terms, you can customize See video below.
Q: Do you have any tips with working with custom Regular Expressions with JMP and text analysis. I find that I spend a lot of time trying to do this and get limited results.
A: We offer a course on Text Analysis that you might find useful. It includes various examples of how to use Regex. There is a JMP blog post about Regex for cleaning data that will be posting in a JMP Community Blog.
Q: t-SNE plot: what does x-y #s mean?
A: For that I would recommend watching a video on multivariate method clustering or on teaching clustering. In this case, we had 25 different columns and reduced that multi-dimensional space down to 2. Think of this as similar to principal components. You can choose how many outputs (or components) you want.
Q: Sometimes there is red text in the phrase list. What does that signify?
Q: Can we import delimited txt files and analyze columns individually?
A: JMP can import delimited text files into a JMP data table. You could analyze multiple text columns individually which would give a separate output for each text column.
Q: In his example, if red color shows the most frequent word, why isn't the biggest word "red"
A: The average score column in the data table is the basis for coloring in his example. The size is the most frequent word/term. In this example, Kemal specified that this word cloud coloring should be controlled by the Average score column from the data table. So Average_score doesn’t come from JMP, but from the data table. See more about Word Cloud options.
A: Tokens are Words/Terms plus Phrases/Cases or a combination of those . In this case I had 15792 Terms, 59360 Cases (or Phrases) and about 17 of these per row (Total Tokens per Case). See more details.
The Tokenizing stage converts text to lowercase, then applies a ‘Tokenizing’ method (either Basic Words or Regex) to group characters into tokens, and then it recodes tokens based on specified recode definitions. Note that recoding occurs before stemming and recode operations are processed internally in one pass regardless of the order that they are specified in the report window. See more about the Text Processing Steps.
Tokens
Q: Can you make a Phrase Cloud rather than a Word Cloud?
A: We contacted JMP Developer, Xan Gregg @XanGregg, after the live webinar. He suggested that the Word Cloud might more accurately be called a Term Cloud. It shows all the words in the Term list. I suppose you could add a lot of Phrases as Terms and remove words by making them stop words, and then you would get a cloud of Phrases. If you are interested in requesting Phrase Clouds, feel free to post request and rationale to the JMP Software Wish List.
Q: I am really interested in the "document clustering" you mentioned earlier. Where can I find information on this where among 100s of text documents and I want to find which ones are in a similar group?
A: What are documents and how do you import them into JMP?
Q: Each row is considered a document and you import it from any file that captures it in rows. For example, you can import the text from Excel. In this example, the reviews were captured as rows in an Excel file.
After the session, JMP user and attendee Mark Anawis @mark_anawis suggested this answer to a question about retrieving 2 words (Phrase) instead of 1 for the Term list. I think you may be able to do this by selecting Tokenizing ‘Regex’ and checking ‘Customize Regex’. This brings up the Text Explorer Regular Expression Editor for the Text Column. You can potentially modify the Regex function to retrieve 2 words instead of 1.
Recommended Articles
'
var data = div.getElementsByClassName("video-js");
var script = document.createElement('script');
script.src = "https://players.brightcove.net/" + data_account + "/" + data_palyer + "_default/index.min.js";
for(var i=0;i< data.length;i++){
videodata.push(data[i]);
}
}
}
for(var i=0;i< videodata.length;i++){
document.getElementsByClassName('lia-vid-container')[i].innerHTML = videodata[i].outerHTML;
document.body.appendChild(script);
}
}
catch(e){
}
/* Re compile html */
$compile(rootElement.querySelectorAll('div.lia-message-body-content')[0])($scope);
}
if (code_l.toLowerCase() != newBody.getAttribute("slang").toLowerCase()) {
/* Adding Translation flag */
var tr_obj = $filter('filter')($scope.sourceLangList, function (obj_l) {
return obj_l.code.toLowerCase() === newBody.getAttribute("slang").toLowerCase()
});
if (tr_obj.length > 0) {
tr_text = "This post originally written in lilicon-trans-text has been computer translated for you. When you reply, it will also be translated back to lilicon-trans-text.".replace(/lilicon-trans-text/g, tr_obj[0].title);
try {
if ($scope.wootMessages[$rootScope.profLang] != undefined) {
tr_text = $scope.wootMessages[$rootScope.profLang].replace(/lilicon-trans-text/g, tr_obj[0].title);
}
} catch (e) {
}
} else {
//tr_text = "This message was translated for your convenience!";
tr_text = "This message was translated for your convenience!";
}
try {
if (!document.getElementById("tr-msz-" + value)) {
var tr_para = document.createElement("P");
tr_para.setAttribute("id", "tr-msz-" + value);
tr_para.setAttribute("class", "tr-msz");
tr_para.style.textAlign = 'justify';
var tr_fTag = document.createElement("IMG");
tr_fTag.setAttribute("class", "tFlag");
tr_fTag.setAttribute("src", "/html/assets/lingoTrFlag.PNG");
tr_fTag.style.marginRight = "5px";
tr_fTag.style.height = "14px";
tr_para.appendChild(tr_fTag);
var tr_textNode = document.createTextNode(tr_text);
tr_para.appendChild(tr_textNode);
/* Woot message only for multi source */
if(rootElement.querySelector(".lia-quilt-forum-message")){
rootElement.querySelector(".lia-quilt-forum-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-message-view-blog-topic-message")) {
rootElement.querySelector(".lia-message-view-blog-topic-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-quilt-blog-reply-message")){
rootElement.querySelector(".lia-quilt-blog-reply-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-quilt-tkb-message")){
rootElement.querySelector(".lia-quilt-tkb-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-quilt-tkb-reply-message")){
rootElement.querySelector(".lia-quilt-tkb-reply-message").insertBefore(tr_para,rootElement.querySelector(".lia-quilt-row.lia-quilt-row-footer"));
} else if(rootElement.querySelector(".lia-quilt-idea-message")){
rootElement.querySelector(".lia-quilt-idea-message").appendChild(tr_para);
}else if(rootElement.querySelector(".lia-quilt-column-alley-left")){
rootElement.querySelector(".lia-quilt-column-alley-left").appendChild(tr_para);
}
else {
if (rootElement.querySelectorAll('div.lia-quilt-row-footer').length > 0) {
rootElement.querySelectorAll('div.lia-quilt-row-footer')[0].appendChild(tr_para);
} else {
rootElement.querySelectorAll('div.lia-quilt-column-message-footer')[0].appendChild(tr_para);
}
}
}
} catch (e) {
}
}
} else {
/* Do not display button for same language */
// syncList.remove(value);
var index = $scope.syncList.indexOf(value);
if (index > -1) {
$scope.syncList.splice(index, 1);
}
}
}
}
}
}
angular.forEach(mszList_l, function (value) {
if (document.querySelectorAll('div.lia-js-data-messageUid-' + value).length > 0) {
var rootElements = document.querySelectorAll('div.lia-js-data-messageUid-' + value);
}else if(document.querySelectorAll('.lia-occasion-message-view .lia-component-occasion-message-view').length >0){
var rootElements = document.querySelectorAll('.lia-occasion-message-view .lia-component-occasion-message-view')[0].querySelectorAll('.lia-occasion-description')[0];
}else {
var rootElements = document.querySelectorAll('div.message-uid-' + value);
}
angular.forEach(rootElements, function (rootElement) {
if (value == '507998' && "TkbArticlePage" == "TkbArticlePage") {
rootElement = document.querySelector('.lia-thread-topic');
}
/* V1.1 Remove from UI */
if (document.getElementById("tr-msz-" + value)) {
document.getElementById("tr-msz-" + value).remove();
}
if (document.getElementById("tr-sync-" + value)) {
document.getElementById("tr-sync-" + value).remove();
}
/* XPath expression for subject and Body */
var lingoRBExp = "//lingo-body[@id = " + "'lingo-body-" + value + "'" + "]";
lingoRSExp = "//lingo-sub[@id = " + "'lingo-sub-" + value + "'" + "]";
/* Get translated subject of the message */
lingoRSXML = doc.evaluate(lingoRSExp, doc, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
for (var i = 0; i < lingoRSXML.snapshotLength; i++) {
/* Replace Reply/Comment subject with transalted subject */
var newSub = lingoRSXML.snapshotItem(i);
/*** START : extracting subject from source if selected language and source language is same **/
var sub_L = "";
if (newSub.getAttribute("slang").toLowerCase() == code_l.toLowerCase()) {
if (value == '507998') {
sub_L = decodeURIComponent($scope.sourceContent[value].subject);
}
else{
sub_L = decodeURIComponent($scope.sourceContent[value].subject);
}
} else {
sub_L = newSub.innerHTML;
}
/*** End : extracting subject from source if selected language and source language is same **/
/* This code is placed to remove the extra meta tag adding in the UI*/
try{
sub_L = sub_L.replace('<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />','');
}
catch(e){
}
// if($scope.viewTrContentOnly || (newSub.getAttribute("slang").toLowerCase() != code_l.toLowerCase())) {
if ($scope.viewTrContentOnly) {
if ("TkbArticlePage" == "IdeaPage") {
if (value == '507998') {
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
document.querySelector('.MessageSubject .lia-message-subject').innerHTML = sub_L;
}
}
}
if ("TkbArticlePage" == "TkbArticlePage") {
if (value == '507998') {
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
var subTkbElement = document.querySelector('.lia-thread-subject');
if(subTkbElement){
document.querySelector('.lia-thread-subject').innerHTML = sub_L;
}
}
}
}
else if ("TkbArticlePage" == "BlogArticlePage") {
if (value == '507998') {
try {
if((sub_L != "") && (sub_L!= undefined) && (sub_L != "undefined")){
var subElement = rootElement.querySelector('.lia-blog-article-page-article-subject');
if(subElement) {
subElement.innerText = sub_L;
}
}
} catch (e) {
}
/* var subElement = rootElement.querySelectorAll('.lia-blog-article-page-article-subject');
for (var subI = 0; subI < subElement.length; subI++) {
if((sub_L != "") && (sub_L!= undefined) && (sub_L != "undefined")){
subElement[subI].innerHTML = sub_L;
}
} */
}
else {
try {
// rootElement.querySelectorAll('.lia-blog-article-page-article-subject').innerHTML= sub_L;
/** var subElement = rootElement.querySelectorAll('.lia-blog-article-page-article-subject');
for (var j = 0; j < subElement.length; j++) {
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
subElement[j].innerHTML = sub_L;
}
} **/
} catch (e) {
}
}
}
else {
if (value == '507998') {
try{
/* Start: This code is written by iTalent as part of iTrack LILICON - 98 */
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
if(document.querySelectorAll('.lia-quilt-forum-topic-page').length > 0){
if(rootElement.querySelector('div.lia-message-subject').querySelector('h5')){
rootElement.querySelector('div.lia-message-subject').querySelector('h5').innerText = decodeURIComponent(sub_L);
} else {
rootElement.querySelector('.MessageSubject .lia-message-subject').innerText = sub_L;
}
} else {
rootElement.querySelector('.MessageSubject .lia-message-subject').innerText = sub_L;
}
}
/* End: This code is written by iTalent as part of iTrack LILICON - 98 */
}
catch(e){
console.log("subject not available for second time. error details: " + e);
}
} else {
try {
/* Start: This code is written by iTalent as part of LILICON - 98 reported by Ian */
if ("TkbArticlePage" == "IdeaPage") {
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
document.querySelector('.lia-js-data-messageUid-'+ value).querySelector('.MessageSubject .lia-message-subject').innerText = sub_L;
}
}
else{
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
rootElement.querySelector('.MessageSubject .lia-message-subject').innerText = sub_L;
/* End: This code is written as part of LILICON - 98 reported by Ian */
}
}
} catch (e) {
console.log("Reply subject not available. error details: " + e);
}
}
}
// Label translation
var labelEle = document.querySelector("#labelsForMessage");
if (!labelEle) {
labelEle = document.querySelector(".LabelsList");
}
if (labelEle) {
var listContains = labelEle.querySelector('.label');
if (listContains) {
/* Commenting this code as bussiness want to point search with source language label */
// var tagHLink = labelEle.querySelectorAll(".label")[0].querySelector(".label-link").href.split("label-name")[0];
var lingoLabelExp = "//lingo-label/text()";
trLabels = [];
trLabelsHtml = "";
/* Get translated labels of the message */
lingoLXML = doc.evaluate(lingoLabelExp, doc, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
/* try{
for(var j=0;j,';
}
trLabelsHtml = trLabelsHtml+'