设为首页收藏本站

JMP数据分析论坛

 找回密码
 立即注册

QQ登录

只需一步,快速开始

搜索
热搜: 活动 交友 discuz
查看: 1531|回复: 2
打印 上一主题 下一主题

Scagnostics JMP Add-in – A New Way to Explore your Data

  [复制链接]
跳转到指定楼层
楼主
发表于 2014-8-17 21:05:18 | 只看该作者 回帖奖励 |倒序浏览 |阅读模式

Scagnostics, scatterplot diagnostics, was discovered by John and Paul Tukey and later popularized by Leland Wilkinson in Graph-Theoretic Scagnostics (2005). These analyses were redefined in High-Dimensional Visual Analytics: Interactive Exploration Guided by Pairwise Views of Point Distributions (2006).

The beauty of scagnostics is the ability to visually explore a dataset. JMP has the inherent feature called Scatterplot Matrix (SPLOM), which allows the user to simultaneously compare the relationship between many pairs of variables.

However, SPLOMs lose their effectiveness when the number of variables get too large. Figure 1 shows a portion of the SPLOM report.

Figure 1. SPLOM for Drosophila Aging Data


We look to explore the Drosophila Aging data with 48 observations and 100 numeric variables.  Notice in Figure 1 the substantial number of variables in this dataset. This can be overwhelm and our ability to visually observe the data is flawed. In Figure 1, only about 15% of the actual SPLOM is shown. In a world where our datasets are growing every day, it is imperative to be able to extract meaningful information from the relationship between our variables. That’s where scagnostics comes in! Scagnostics assesses five aspects of scatterplots: outliers, shape, trend, density, and coherence.

This summer, I had the privilege of writing a JMP add-in (downloaded here with a free SAS profile) that allows the user to interactively explore data using nine graph-theoretic measures.  The add-in combines three current features of JMP: Distribution, Scatterplot Matrix, and Graph Builder. Each point in the scatterplot represents a 2D scatterplot. When the user selects a point in the scatterplot matrix in the bottom left, Graph Builder shows the respective scatterplot for the two variable in the bottom right.

As an example, one point has already been selected in the SPLOM in Figure 2. The corresponding variables are log2in_Tsp42Ej and log2in_CG6372. For this pair of variables, there are two discernible clusters of data. This is noted in a high Clumpy value.

Figure 2. Scagnostics for Drosophila Aging Data – Clumpy Example


Figure 3 below shows us that if we select a point with a high monotonic value, we can observe a clear association and a strong linear relationship between the variables,  log2in_alpha_Cat and log2in_CG3430der.

Figure 3. Scagnostics for Drosophila Aging Data – Monotonic Example


Another key aspect of Scagnostics is outlier detection. Review the Graph Builder plot in Figure 4 below. When we inspect the two variables log2in_CG18178 and log2in_BcDNA_GH04120, we see two data points that visually appear to be outliers. Results with a substantial outlying value, as well as a relatively high skewed value, support the notion that this pair of variables has major outliers overall.

Figure 4. Scagnostics for Drosophila Aging Data – Outlying Example


As we compare the original SPLOM report in Figure 1 to the recursive SPLOM and Graph Builder reports in Figures 2, 3, and 4, we uncover much more informative and enlightening analyses.

Now it’s time to download the Scagnostics add-in and begin your own exploration!


分享到:  !connect_viewthread_share_to_qq!!connect_viewthread_share_to_qq! QQ空间QQ空间 腾讯微博腾讯微博 腾讯朋友腾讯朋友
收藏收藏 转播转播 分享分享 分享淘帖 支持支持 反对反对
回复

使用道具 举报

沙发
发表于 2014-8-18 08:59:31 | 只看该作者
Thanks for sharing
回复 支持 反对

使用道具 举报

板凳
 楼主| 发表于 2014-8-18 17:54:14 | 只看该作者
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|JMP数据分析论坛 ( 沪ICP备13022603号-2 )  

GMT+8, 2024-5-17 05:36 , Processed in 0.381130 second(s), 15 queries .

Powered by Discuz! X3

© 2001-2013 Comsenz Inc.

快速回复 返回顶部 返回列表