Graphics Help
Interacting with Graphics
Interacting with graphics is a very powerful tool for doing data
analysis. Most of the graphics in StatCrunch 3.0 are interactive (with
the exception of the Means Plot and Chart Group Stats). To interact
with a graphic, draw a rectangle around the desired objects in the
graph by clicking and dragging the mouse. The objects will be
highlighted in the graph as well as all other interactive graphics and
the data table. The method
that each individual graphic uses to handle interaction is discussed
below.
Graph Layout
The appearance of graphics may be customized by specifying graph
layout options such as the number and format of graphs per
page and the color scheme used in the graphics. These options are
typically specified in the last dialog screen when producing graphics.
By default, the number of rows and the number of columns per page is
one, so one graph per page is produced. A page here is defined by the
visible width and height of a browser window. When the window is
resized, the graph resizes to fill the entire browser window. By
changing the number of rows and number of columns, one can produce a
matrix of plots. For example, if the number of rows is set to three
and the number of columns is set to two, the resulting output will be
formatted so that a three by two matrix of six plots per page will be
visible. Color Schemes are discussed
below in detail.
Color Schemes
By adding/changing color scheme options under the Graphics
menu, one can control the way that StatCrunch accesses colors when
producing graphics. A color scheme is defined by an ordered sequence
of colors that are accessed in succession when StatCrunch produces a
graphic. One color from the sequence is defined as the background for
the graphic, and one color is defined as the foreground which is the
default color of axes and other standard graphic elements. In a
graphic that uses multiple colors, the background and foreground
colors are not included as StatCrunch cycles through the list. The
default color scheme consists of the sequence: black, white, red,
blue, green, yellow, cyan, orange, dark green, and gray with black
being the foreground and white being the background.
StatCrunch also offers two color schemes which consist of a gradient of
colors between two primary colors: Grayscale (white to black)
and Red to Blue. These scale color schemes may be most
useful when grouping by a binned numerical column. The
number of colors in the sequence for a scale color scheme depends on
the number of colors needed in a particular situation. If 28 colors
are required, then StatCrunch defines a sequence of 28 colors between the
two primary colors. The background and foreground for these color
schemes are white and black, respectively, and these definitions may
not be changed.
StatCrunch allows users to edit existing color schemes and add new color
schemes. One may do this by clicking on the Color Schemes link
under the Graphics menu. To edit an existing color scheme, select
the color scheme from the list of defined color schemes and click the
OK button. To add a new color scheme, select the Add a new color
scheme option and then specify a name for the color scheme. To construct a scale color scheme, select the
Use scale option.
- If the selected color scheme is not a scale color scheme, a new
dialog screen will appear where one can change the background and
foreground colors for the color scheme as well as the definition of
the colors on the list. The RGB attributes of an existing color as
well as its position in the sequence are displayed when the color is
selected from the color list. These properties may be changed by
clicking the Update button. RGB values must be integers in the
range from 0 to 255. For more information on RGB values, click here. New
colors may also be added to the list by clicking the Add button
after specifying a name for the color, its RGB attributes, and its
position in the sequence. A selected color may be removed from the list
by clicking the Remove button.
- If the selected color scheme is a scale color scheme, a new dialog screen will appear that allows one to specify the RGB values for the starting and ending colors for the color sequence.
When one is finished editing the color scheme, click the Save button. The screen displaying the options for editing and adding color schemes will then reappear. Click the Cancel button when finished editing color schemes.
Bar Plot
Displays the frequency (or relative frequency) for all distinct values of selected columns.
- Select the column(s) to be displayed in the plot(s). A separate plot will be generated for each column selected.
- Enter an optional Where clause to specify the data rows to be included in the computation.
- Select an optional Group by column to group results. By default if a Group by column is specified, the frequency
(relative frequency) of each distinct value of each selected column will be displayed in a sequence of bars color-coded by the corresponding values of the Group by column.
This is commonly referred to as a side-by-side bar plot and is denoted by the Split bars option in StatCrunch.
There is also an option to stack bars when a group by column is used. In this case, the color-coded bars corresponding to the different values of the grouping column are stacked one on top of the other.
This type of graphic may be a great choice when emphasis is to be placed on the total number in a category rather than emphasizing different totals between groups.
To create a separate bar graph for each unique value of the Group by column, select the separate graph for each group option.
- Click the Next button to choose between plotting the frequency, relative frequency, percent, relative frequency (within category) or percent (within category) on the y-axis.
For each type of plot, the distinct values (categories) of the columns selected will be shown on the x-axis. The within category plot types are used only when a group by column is specified. When these plot types are specified,
relative frequencies and percents are calculated for each unique value of the group by column relative to the total number of observations within each category. Otherwise, relative frequencies and percents are calculated relative to the total
number of observations across all categories.
- Click the Next button again to specify graph layout options.
- Click the Create Graph! button to create the plot(s).
Pie Chart
Displays the relative frequency for all distinct values of selected columns.
- Select the column(s) for which a pie chart is to be constructed. A separate chart will be generated for each column selected.
- Enter an optional Where clause to specify the data rows to be included in the computation.
- Select an optional Group by column to construct a separate pie chart for each distinct value of this column.
- Click the Next button to choose what information (count and/or percent of total) to display in the label for each category.
- Click the Next button again to specify graph layout options.
- Click the Create Graph! button to create the plot(s).
Histogram
Displays the frequency, relative frequency or density for numerical data bined into classes.
- Select the column(s) to be displayed in the plot(s). A separate plot will be generated for each column selected.
- Enter an optional Where clause to specify the data rows to be included in the computation.
- Select an optional Group by column to construct a histogram for each distinct value of this column.
- Click the Next button to select either the Frequency, Relative Frequency or Density histogram. In addition, optional values for the starting point of the bins and the bin width may be specified. These parameters will apply to all of the histograms to be constructed.
- Click the Next button again to specify graph layout options.
- Click the Create Graph! button to create the plot(s).
Stem and Leaf
Displays a character based plot of a column that is similar to
a histogram turned on its side. The actual (or approximate) data values are
represented in the plot.
- Select the column(s) to be displayed in the plot(s). A separate plot will be generated for each column selected.
- Enter an optional Where clause to specify the data rows to be included in the computation.
- Select an optional Group by column to construct a separate stem and leaf plot for each distinct value of this column.
- Click the Next button to choose how to trim outliers. StatCrunch offers three trimming options: no trimming, mild and extreme outliers, and extreme only (the default).
Trimming mild and/or extreme outliers will remove the appropriatte data values from the plot and place these outliers on separate
Low and/or High stems. Mild outliers are more than1.5 times the interquartile range below (above) the first (third) quartile.
Extreme outliers aremore than 3 times the interquartile range below (above) the first (third) quartile.
- Click the Create Graph! button to create the plot(s).
Boxplot
Displays a graphical representation of the 5-number summary for a set of numerical values, or optionally, a boxplot using inner and outer fences.
- Select the column(s) to be displayed in the plot(s). If multiple columns are selected, the plots will be stacked in the reverse order of selection in the same graphic.
- Enter an optional Where clause to specify the data rows to be included in the computation.
- Select an optional Group by column to construct boxplots for each distinct value of this column. If a Group by column is specified, select either to stack plots of each group for each column to be plotted or to stack plots of each column for each group.
- Click the Next button to choose to use fences when constructing the plots. By default, this option is not selected.
- Click the Next button again to specify graph layout options.
- Click the Create Graph! button to create the plot(s).
Dotplot
Displays a graphical representation of numerical values as points on a
number line. Points with the same pixel representation are stacked on
top of each other. If the number of points in a stack exceeds the
height of the graphic, each point on the plot may represent more than
one observation. If this occurs, the number of observations per point
will be shown in the title of the graphic.
- Select the column(s) to be displayed in the plot(s). If multiple
columns are selected, the plots will be stacked in the reverse order
of selection in the same graphic.
- Enter an optional Where clause to specify the data rows to be included in the computation.
- Select an optional Group by column to
construct dotplots for each distinct value of this column. If a Group by column is
specified, select either to stack the plots of each group for each
column or to stack plots of each column for each group.
- Click the Next button to specify graph layout options.
- Click the Create Graph! button to create the plot(s).
Means Plot
Displays the mean plus or minus two standard errors for a set of numerical values. This is not an interactive graphic.
- Select the column(s) to be displayed in the plot(s). If multiple columns are selected, the plots will be stacked in the reverse order of selection in the same graphic.
- Enter an optional Where clause to specify the data rows to be included in the computation.
- Select an optional Group by column to
construct means plots for each distinct value of this column. If a Group by column is
specified, select either to stack the plots of each group for each
column or to stack plots of each column for each group.
- Click the Next button to specify graph layout options.
- Click the Create Graph! button to create the plot(s).
QQ Plot
Displays the sample quantiles of a variable versus the quantiles of a
standard normal distribution.
- Select the column(s) to be displayed in the plot(s).
- Enter an optional Where clause to specify the data rows to be included in the computation.
- Select an optional Group by column to generate a separate QQ plot for each distinct value of the Group by column.
- Click the Next button to specify graph layout options.
- Click the Create Graph! button to create the plot(s).
Scatter Plot
Displays pairs of numerical values (points) on typical Cartesian (perpendicular) axes.
- Select the column containing the X-values of the points.
- Select the column containing the Y-values of the points.
- Enter an optional Where clause to specify the data rows to be included in the computation.
- Select an optional Group by column to group results. By default if a Group by
column is specified, a single scatter plot will be generated where the
points are color-coded according to the distinct values of the
Group by column. To create a separate scatter plot for each
unique value of the Group by column, check the separate
graph for each group option.
- Click the Next button to specify graph layout options.
- Click the Create Graph! button to create the plot(s).
Multi Plot
Plots multiple pairs of points on the same graph or separate graphs. Pairs
may be plotted as points, connected with lines or both plotted with
points and connected with lines.
- Select the column containing the X-values of the points.
- Select the column containing the Y-values of the points.
- Choose the method for plotting the pairs (points, lines or both).
- Click on Add to add the pairing to the plot. The pairing
will then be displayed in the selection box. To delete the pairing,
select it and click on Delete.
- Repeat the above steps to select multiple pairings.
- Check the Separate graph for each variable combination
option if you do not want the selected pairs plotted on the same graphic.
- Enter an optional Where clause to
specify the data rows to be included in the plot.
- Select an optional Group by column to
group results. If a Group by column is used, the Separate
graph for each variable combination is ignored. By default if a
Group by column is specified, a single plot will be generated
where each pairing is color-coded according to the distinct values of
the Group by column. To create a separate plot for each unique
value of the Group by column, check the separate
graph for each group option.
- Click the Next button to specify graph layout options.
- Click the Create Graph! button to create the plot(s).
Index/Time Plot
Display the values of a column versus index values, time/date options or custom labels for the x-axis. Consecutive points in the plot are connected with lines.
- Select the column(s) to be displayed in the plot(s). By default if more than one column is selected, the values for each column are color-coded and displayed in a single plot. To display each column in a separate graph, check the separate graph for each column option.
- Enter an optional Where clause to specify the data rows to be included in the computation.
- Click the Next button to specify the format for the x-axis.
- If the index plot is chosen, the graph will work as in index plot. Change the start index and increment to customize the x-axis.
- If the time option is chosen, there are a variety of time/date options available. Select the desired type of time or date, and enter the starting values along with the increment between values. The increment will always be in units of the smallest time/date unit. Note: if "Hour Day" or "Minute Hour Day" is selected, the day variable will be sequential and not based on a celendar.
- If the custom option is chosen, any column in the data may serve as the labels for the x-axis. Select which column will be used and enter both the starting row of the labels along with the spacing between labels displayed on the x-axis.
- Click the Next button to customize what is displayed on the graph, points, lines, etc...
- Click the Next button again to specify graph layout options.
- Click the Create Graph! button to create the plot(s).
Chart Group Stats
Displays the selected statistics for values in a column grouped by
another column.
- Select the statistics to chart.
- Select the column for Data In that contains the values for which the statistics are to be computed.
- Select the column for Groups In that contains the distinct values used to define groups.
- Enter an optional Where clause to specify the data rows to be included in the computation.
- By default all selected statistics will be color-coded and
displayed on a single plot. To construct a separate graph for
each statistic, check the separate graph for each
statistic option.
- Click the Next button to choose what to plot on each graphic. The choices are:
- Plot points
- Connect values with lines
- Plot points with connected lines (default)
- Draw Bars
- Click the Next button again to specify graph layout options.
- Click the Create Graph! button to create the plot(s).
Parallel Coordinates Plot
Displays data for two or more variables on parallel axes. The coordinates of a data value are connected with lines.
- Select the column(s) to be displayed in the plot(s).
- Enter an optional Where clause to specify the data rows to be included in the computation.
- Select an optional Group by column to group results. By default if a Group by column is specified, a single parallel coordinates plot will be generated where the lines connecting coordinates are color-coded according to the distinct values of the Group by column. To create a separate parallel coordinates plot for each unique value of the Group by column, check the separate graph for each group option.
- Click the Next button to specify graph layout options.
- Click the Create Graph! button to create the plot(s).
Pairs Plot
Displays a matrix of pairwise scatter plots for two or more selected columns.
- Select the column(s) to be displayed in the plot(s).
- Enter an optional Where clause to specify the data rows to be included in the computation.
- Select an optional Group by column to group results. By default if a Group by column is specified, a single plot matrix will be generated where the points in each plot are color-coded according to the distinct values of the Group by column. To create a separate plot matrix for each unique value of the Group by column, check the separate graph for each group option.
- Click the Next button to specify graph layout options.
- Click the Create Graph! button to create the plot(s).
3D Rotating Plot
Displays a rotating XYZ scatter plot of three selected columns.
- Select the column containing the X-values of the points.
- Select the column containing the Y-values of the points.
- Select the column containing the Z-values of the points.
- Enter an optional Where clause to specify the data rows to be included in the computation.
- Select an optional Group by column to group results. By default if a Group by column is specified, a single 3D rotating plot will be generated where the points are color-coded according to the distinct values of the Group by column. To create a separate plot for each unique value of the Group by column, check the separate graph for each group option.
- Click the Next button to specify graph layout options.
- Click the Create Graph! button to create the plot(s).
Stars Plot
Displays a sequence of "stars" for each observation in a multivariate data set.
The value of each variable is represented by a line segment drawn at a specific angle from the center of the star
outward. The length of the segment represents the magnitude of the value relative to other values of the variable.
- Select the columns containing the multivariate data.
- Enter an optional Where clause to specify the data rows to be included in the computation.
- Specify an optional column which contains labels to be used for each star.
- Select the use full circle if you would like the stars to be drawn around a complete circle.
By default the stars are drawn around the top half of a circle.
- Click the Next button to specify graph layout options.
- Click the Create Graph! button to create the stars.
Word Wall
Displays a graph highlighting the most common words in the selected columns. Each word is displayed within a bar with a width that is
proportional to the number of times the words occurs. The bars are filled with different colors to better separate them visually.
The bars are also stacked in a manner to fill the space available within the graphic.
See http://www.statcrunch.com/twitter for an example word wall.
- Select the columns containing the words of interest.
- Enter an optional Where clause to specify the data rows to be included.
- Select an optional Group by column to group results.
By default if a Group by column is specified, a separate graph will be produced for each unique value of the
Group by column.
- Specify the delimiter to be used to separate words. A blank space is the default delimiter.
- Click the Next button to specify additional options.
- Specify whether or not to add an axis showing word frequency. This option is checked by default.
- Specify whether or not to remove common words and/or punctuation from the word wall. These options are checked by default. StatCrunch has a long
list of common words shown below and in the dialog window. Words can be deleted from the common words list by removing them from the
text area, and additional words can be added to the list by entering them in the text area.
Common words list:
& a able about above abroad according accordingly across actually adj after afterwards again against ago ahead
ain't all allow allows almost alone along alongside already also although always am amid amidst among amongst
an and another any anybody anyhow anyone anything anyway anyways anywhere apart appear appreciate appropriate
are aren't around as a's aside ask asking associated at available away awfully b back backward backwards be became
because become becomes becoming been before beforehand begin behind being believe below beside besides best better
between beyond both brief but by c came can cannot cant can't caption cause causes certain certainly changes
clearly c'mon co co. com come comes concerning consequently consider considering contain containing contains
corresponding could couldn't course c's currently d dare daren't definitely described despite did didn't different
directly do does doesn't doing done don't down downwards during e each edu eg eight eighty either else elsewhere
end ending enough entirely especially et etc even ever evermore every everybody everyone everything everywhere ex
exactly example except f fairly far farther few fewer fifth first five followed following follows for forever former
formerly forth forward found four from further furthermore g get gets getting given gives go goes going gone got gotten
greetings h had hadn't half happens hardly has hasn't have haven't having he he'd he'll hello help hence her here hereafter
hereby herein here's hereupon hers herself he's hi him himself his hither hopefully how howbeit however hundred i
i'd ie if ignored i'll i'm immediate in inasmuch inc inc. indeed indicate indicated indicates inner inside insofar
instead into inward is isn't it it'd it'll its it's itself i've j just k keep keeps kept know known knows l last lately
later latter latterly least less lest let let's like liked likely likewise little look looking looks low lower ltd m
made mainly make makes many may maybe mayn't me mean meantime meanwhile merely might mightn't mine minus miss more
moreover most mostly mr mrs much must mustn't my myself n name namely nd near nearly necessary need needn't needs neither
never neverless nevertheless new next nine ninety no nobody non none nonetheless noone no-one nor normally not
nothing notwithstanding novel now nowhere o obviously of off often oh ok okay old on once one ones one's only onto
opposite or other others otherwise ought oughtn't our ours ourselves out outside over overall own p particular particularly
past per perhaps placed please plus possible presumably probably provided provides q que quite qv r rather rd re really
reasonably recent recently regarding regardless regards relatively respectively right round s said same saw say saying
says second secondly see seeing seem seemed seeming seems seen self selves sensible sent serious seriously seven several
shall shan't she she'd she'll she's should shouldn't since six so some somebody someday somehow someone something sometime
sometimes somewhat somewhere soon sorry specified specify specifying still sub such sup sure t take taken taking tell
tends th than thank thanks thanx that that'll thats that's that've the their theirs them themselves then thence there
thereafter thereby there'd therefore therein there'll there're theres there's thereupon there've these they they'd
they'll they're they've thing things think third thirty this thorough thoroughly those though three through throughout
thru thus till to together too took toward towards tried tries truly try trying t's twice two u un under underneath
undoing unfortunately unless unlike unlikely until unto up upon upwards us use used useful uses using usually v value
various versus very via viz vs w want wants was wasn't way we we'd welcome well we'll went were we're weren't we've what
whatever what'll what's what've when whence whenever where whereafter whereas whereby wherein where's whereupon wherever
whether which whichever while whilst whither who who'd whoever whole who'll whom whomever who's whose why will willing
wish with within without wonder won't would wouldn't x y yes yet you you'd you'll your you're yours yourself yourselves
you've z zero
- Click the Next button two more times to specify graph layout options.