General Help
What is StatCrunch?
StatCrunch is a statistical data analysis package for
the World Wide Web. It is written in the form of a Java applet. We think users will find it easy to use, and we hope they enjoy
working with our package!
Who we are
StatCrunch was created and programmed by a team of
programmers and statisticians led byWebster West. Dr. West is in the Department of Statistics at
Texas A&M University. The
package was created as an initial attempt to solve many of the problems that
exist with the delivery and use of modern statistical software. Many times
statisticians develop procedures in languages such as Splus, SAS, Minitab,
etc.., which are very specific to statisticians. Students and other potential
users may not have access to these packages, and therefore may not be able to
use the procedures. By using Java and the World Wide Web, StatCrunch should reach
the broadest possible audience of any statistical software of its kind.
Getting started
StatCrunch should run on any of the three major platforms (Mac, PC, Unix). It
only has the minimal requirement of a Java-capable Web browser which almost
everyone on the Web now has. If you do not have a Java-enabled browser, you will probably see a gray box which may or may not have a red x in it after logging in to StatCrunch. A test to determine if a browser is Java-enabled
is given below:
Java -
If the test above indicates that Java is not enabled,
perform one of the following:
Explorer 5.x
- Go to "Tools | Internet Options..." from the main menu
- Change to the "Security" tab
- Click "Custom Level..." button
- To enable: make sure a setting other than "Disable Java" is selected under "Java." If you're not sure which setting to choose, select "High safety"
- Restart the browser
Explorer 4.x
- Go to "View | Internet Options..." from the main menu
- Change to the "Security" tab
- Select "Custom" and click on the "Settings..." button
- To enable: make sure a setting other than "Disable Java" is selected under "Java." If you're not sure which setting to choose, select "High safety"
- Restart the browser
Communicator 4.x
- Go to "Edit | Preferences..." from the main menu
- Select "Advanced" panel
- To enable: make sure "Enable Java" check box is checked.
- Restart the browser
Navigator 3.x
- Go to "Options | Network Preferences..." from the main menu
- Change to the "Languages" tab
- To enable: make sure "Enable Java" check box is checked
- Restart the browser
Using StatCrunch
The Data, Stat, Graphics and
Help menus, located at the top of the StatCrunch frame, provide users with
access to the analysis procedures of the software.
The Help menu is linked to the StatCrunch help page. See the Data,
Stat
and Graphics
help pages for a listing of these procedures and instructions on how to use
them.
The dataset to be analyzed is displayed inside the data
table located below the menu bar. StatCrunch offers a variety of
methods for loading
data. After loading data and selecting a menu item, a listing of
the available procedures will appear in a new window. A dialog box
will appear after selecting one of these procedures. In the dialog box
will be a ? button which directly links the user to the
relevant help information for that procedure. After making selections
within the dialog box, the results of the procedure will appear in the
window.
Saving and Printing Results
To copy, save or print StatCrunch results, you will first need to export the result to HTML. First, select the Export option under the Options menu of the result window. The graphics in the output are written as GIF files on
the StatCrunch server, so this may take a few moments for results that
contain a large number of graphics. With the latest StatCrunch interface, the results in HTML format will be displayed in the frame below the data table. In older versions of the interface, the results appear in pop up windows (which may be blocked if you have a pop up blocker turned on in your browser). In either case, use the browser's File menu to print your results. In most browsers, you can also copy selected graphics and/or text to the clipboard by choosing the Copy option under the Edit menu of your browser or by right clicking your mouse in the frame containing the HTML results. It is important to remember that the graphics links in the
HTML file are to the graphics stored on the StatCrunch server. The file
names for graphics are encoded with random letters so that only
individuals who have the exact file names will have access to them.
Individual graphics may be saved by clicking on the graphic in the new
window and then using the browser's File menu to save or print it.
If graphics are downloaded to a local file system, the IMG tags in the
HTML file must be edited to indicate the proper path to the graphics
files on the local system.
Including StatCrunch in a web page
Feel free to link to
the StatCrunch site using the following syntax:
<A href="http://www.statcrunch.com/">StatCrunch</A>
Linking Data
Using the Link Generator
Form, an HTML link can be created so that a specified data file on
the web will be automatically loaded into StatCrunch when the link is
clicked. Both text and Excel files can be linked.
- Simply specify the text of the link to be displayed on the web
page (e.g., "My Data File").
- Specify the WWW address of the dataset to be loaded (e.g.,
http://www.myData.com/myData.txt).
- If the first line of the data file contains variable names, check
the Use first line as variable names option.
- If the data file is a text file, specify the delimiter for the
observations. The delimiter options are whitespace (any whitespace
character such as a space or tab), tab, comma (for .csv files) and
semicolon.
As an example, the Excel data file located at http://www.stat.sc.edu/~west/hotdog.xls can be accessed by
clicking the following link: Hotdog Data
Using WHERE
When selecting data to be used with the
various analysis procedures, a WHERE statement can be used to
determine which rows from the data table will be included in the
analysis. The Where statement provides an excellent way to
isolate a subgroup within the data for analysis. The statement should
be a valid boolean expression which evaluates to either a true or false
value. The expression will be evaluated for each row in the data set,
and only rows where the expression evaluates to true will be included
in the analysis. See the section on expressions below for more information on
constructing boolean expressions. Example syntax for Where
statements, using the Hotdog Data, are given below.
- Calories=190
-
includes rows where the Calories column is equal to 190
- Calories>150
-
includes rows where the Calories column is greater than 150
- Calories>=150
-
includes rows where the Calories column is greater than or equal to 150
- Calories<>190
-
includes rows where the Calories column is not equal to 190
- Calories!=190
-
includes rows where the Calories column is not equal to 190
- LOG(Calories)>5
-
includes rows where the natural logarithm of the Calories column is greater than 5
- Type=Meat
-
includes rows where the text in the Type column is Meat.
- Type="Meat"
-
includes rows where the text in the Type column is Meat. Note that it is only necessary to use double quotes
when the text string contains spaces.
- Type<>Beef
-
includes rows where the text in the Type column is not Beef.
- Sodium=386 AND Type=Meat
-
includes rows where the Sodium column is equal to 386 and the Type column is Meat
- Sodium<=400 OR Type="Meat"
-
includes rows where the Sodium column is less than or equal to 400 or the Type column is Meat
- (Sodium>=400 OR Sodium<=500) AND Type="Meat"
-
includes rows where the Sodium column is between 400 and 500 and the Type column is Meat
- row=5
-
includes only the 5th row
- row>=3 AND row<=10
-
includes rows 3 through 10
Expressions
Some StatCrunch procedures allow the user to
input either a boolean (true/false) or mathematical expression. See
the compute expression
section to see examples of mathematical expressions. See the WHERE section for examples of boolean
expressions used to control the data rows that are included in an
analysis. Notes on using expressions:
- Most expressions should contain references to the existing
columns in the data table. If there is a column name that contains a
space, references to the column need to be enclosed in double quotes
(e.g., "Column One").
- Row may be used to refer to the row id column in the data
table. This is a StatCrunch keyword, so any other columns given this name
will not be properly referenced.
- Parentheses can be used to force the order of evaluation in both mathematical and boolean expressions.
- The syntax for StatCrunch expressions
follows ANSI SQL syntax. The components that can be used in an expression are listed below.
- Comparison Operators
These operators below are very useful when constructing boolean expressions
for Where statements.
- =
- tests for equality of numeric or text values
- >
- tests if one numeric value is greater than another numeric value
- <
- tests if one numeric value is less than another numeric value
- >=
- tests if one numeric value is greater than or equal to another numeric
value
- <=
- tests if one numeric value is less than or equal to another numeric
value
- <>
- tests for nonequality of values
- !=
- tests for nonequality of values
- IS NULL
- tests for a null value (empty cell)
- IS NOT NULL
- tests for a nonnull value
- Logical Operators
These operators below are very useful when constructing boolean expressions
for Where statements.
- AND
- compares two boolean values, returns true if both are true, and false
otherwise
- OR
- compares two boolean values, returns true if either is true, and false
otherwise
- Arithmetic Operators
These operators return numeric results when used with numeric arguments
and null values otherwise.
- /
- divides two numeric values
- *
- multiplies two numeric values
- +
- adds two numeric values
- -
- subtracts two numeric values
- **
- exponentiates one numeric value by another
- ^
- Same as ** above.
- Comparison Functions
The functions below produce boolean (true/false) outputs.
You can specify columns names as inputs in which case
the function will return a vector applying the function to each value in the input column.
- between(x,y,z)
- returns true if x is between y and z (noninclusive) and false otherwise
- ifelse(x,y,z)
- returns y if x is true and z otherwise
- ifnull(x,y,z)
- returns y if x is null (empty) and z otherwise
- isNaN(x)
- returns true if x is not a numeric value and false otherwise
- isNull(x)
- returns true if x is null and false otherwise
- Mathematical Functions
The functions below require numeric inputs and provide numeric outputs.
StatCrunch attempts to coerce nonumeric values into the correct input type.
You can specify columns names as inputs in which case
the function will return a vector applying the function to each value in the input column.
- abs(x)
- absolute value
- acos(x)
- arc cosine
- asin(x)
- arc sine
- atan(x)
- arc tangent
- ceil(x)
- ceiling, round up
- cos(x)
- cosine
- dbeta(x,alpha,beta)
- beta distribution function at the value x with shape alpha and scale beta
- dbinom(x,n,p)
- binomial distribution function at the value x with parameters n and p
- dcauchy(x)
- cauchy density at the value x with location 0 and scale 1
- dchisq(x,df)
- chi-square density at the value x with degrees of freedom df
- df(x,ndf,ddf)
- F density at the value x with numerator degrees of freedom ndf and denominator degrees of freedom ddf
- dgamma(x,alpha)
- gamma density at the value x with shape alpha
- dnorm(x,mu,sigma)
- normal density at the value x with mean of mu and standard deviation sigma
- dpois(x,lambda)
- Poisson distribution function at the value x with mean lambda
- dt(x,df)
- t density at the value x with degrees of freedom df
- exp(x)
- exponent
- floor(x)
- truncates to nearest integer
- larger(x,y)
- returns the larger of x and y
- lesser(x,y)
- returns the lesser of x and y
- lngamma(x)
- natural logarithm of the gamma function
- log(x)
- natural logarithm
- log10(x)
- logarithm base 10
- log2(x)
- logarithm base 2
- logbeta(x,y)
- natural logarithm of the beta function
- ln(x)
- natural logarithm base e
- pbeta(x,alpha,beta)
- beta CDF at the value x with shape alpha and scale beta
- pbinom(x,n,p)
- binomial CDF at the value x with parameters n and p
- pcauchy(x)
- cauchy CDF at the value x with location 0 and scale 1
- pchisq(x,df)
- chi-square CDF at the value x with degrees of freedom df
- pf(x,ndf,ddf)
- F CDF at the value x with numerator degrees of freedom ndf and denominator degrees of freedom ddf
- pgamma(x,alpha)
- gamma CDF at the value x with shape alpha
- pnorm(x,mu,sigma)
- normal CDF at the value x with mean of mu and standard deviation sigma
- ppois(x,lambda)
- Poisson CDF at the value x with mean lambda
- pt(x,df)
- t CDF at the value x with degrees of freedom df
- qbeta(x,alpha,beta)
- beta quantile at the value x (between 0 an 1) with shape alpha and scale beta
- qbinom(x,n,p)
- binomial quantile at the value x (between 0 an 1) with parameters n and p
- qcauchy(x)
- cauchy quantile at the value x (between 0 an 1) with location 0 and scale 1
- qchisq(x,df)
- chi-square quantile at the value x v with degrees of freedom df
- qf(x,ndf,ddf)
- F quantile at the value x (between 0 an 1) with numerator degrees of freedom ndf and denominator degrees of freedom ddf
- qgamma(x,alpha)
- gamma quantile at the value x (between 0 an 1) with shape alpha
- qnorm(x,mu,sigma)
- normal quantile at the value x (between 0 an 1) with mean of mu and standard deviation sigma
- qpois(x,lambda)
- Poisson quantile at the value x (between 0 an 1) with mean lambda
- qt(x,df)
- t quantile at the value x (between 0 an 1) with degrees of freedom df
- rbeta(n,alpha,beta)
- beta sample of size n with shape alpha and scale beta
- rbinom(n,size,p)
- binomial sample of size n with parameters size and p
- rcauchy(n)
- cauchy sample of size n with location 0 and scale 1
- rchisq(n,df)
- chi-square sample of size n with degrees of freedom df
- rf(n,ndf,ddf)
- F sample of size n with numerator degrees of freedom ndf and denominator degrees of freedom ddf
- rgamma(n,alpha)
- gamma sample of size n with shape alpha
- rnorm(n,mu,sigma)
- normal sample of size n with mean of mu and standard deviation sigma
- round(x)
- rounds to nearest integer
- rpois(n,lambda)
- Poisson sample of size n with mean lambda
- rt(n,df)
- t sample of size n with degrees of freedom df
- sin(x)
- sine
- sqrt(x)
- square root
- tan(x)
- tangent
- String Functions
The functions below require string (text) inputs.
StatCrunch attempts to coerce nonstring values into the correct input type.
You can specify columns names as inputs in which case
the function will return a vector applying the function to each value in the input column.
- contains(x,y)
- returns true if the string x contains the string y and false otherwise
- endsWith(x,y)
- returns true if the string x ends with the string y and false otherwise
- indexOf(x,y)
- returns the first numeric index in the string x where the string y occurs (starting at index of 0), returns -1 if x does not contain y
- lastIndexOf(x,y)
- returns the last numeric index in the string x where the string y occurs (starting at index of 0), returns -1 if x does not contain y
- length(x)
- returns the length of the string x
- replace(x,y,x)
- replaces all occurrences of y with z in x
- startsWith(x,y)
- returns true if the string x starts with the string y and false otherwise
- substring(x,y)
- returns the substring of x beginning at y (starting at index of 0)
- Column Functions
The functions below have a column name(s) as an argument(s) and return
numeric values or vectors. The function names are not case sensitive.
- count(x)
- returns the number of values in the column
- cor(x,y)
- correlation between x and y
- cov(x,y)
- covariance between x and y
- max(x)
- returns the maximum of the column
- mean(x)
- returns the mean of the column
- median(x)
- returns the median of the column
- min(x)
- returns the minimum of the column
- percentile(x,p)
- returns the pth percentile of x
- range(x)
- returns the range of the column
- rep(x,y)
- returns each value in x repeated a corresponding number of y times (with all sequences stacked)
- seq(x,y,z)
- returns the sequence of values from x to y by z (with all sequences stacked)
- sort(x)
- returns the sort of the column
- std(x)
- returns the standard deviation of the column
- sum(x)
- returns the sum of the column
- var(x)
- returns the variance of the column
- Row Functions
The functions below require a comma delimited list of column names (...) as an argument and returns in most cases
a vector of values for each row.
- pconcat(...)
- row wise string concatenation
- pmax(...)
- row wise maximum
- pmean(...)
- row wise mean
- pmin(...)
- row wise minimum
- pvar(...)
- row wise variance
- concat(...)
- stacks all values into a single column
Using GROUP BY
Most StatCrunch analysis procedures allow the user to group results based
on a column in the data table. For example, to compute summary
statistics of Calories grouped by Type using the Hotdog Data, select Calories under Select column(s) and
Type as the "Group by" variable. This will return summary
statics for each distinct value of Type.
Some of the graphics provide an option to view separate graphs for each
group. This option is not selected by default. If this option is not
chosen, then the plot will be color coded based on the grouping
variable for easy reference.
Fonts
StatCrunch allows the user to specify the three separate fonts that are used to display the data table, text results
(tabular results),
and graphical results. To specify these fonts, use the Edit > Fonts
menu option, and then specify the name/size for each of the fonts. Note that the available fonts
may vary depending on the fonts available on the user's computer system. Also, note that
the font specified for graphical results is the maximum font size that may be used. When constructing graphics,
StatCrunch may shrink the font in order to fit the graphic nicely in a standard sized result window. The corresponding result window
may be manually resized to increase the font size up to the maximum font specified for graphics.
Orderings
StatCrunch allows the user to create orderings that are used to determine the display order for certain types
of tabular and graphical output. An ordering can be specified to help StatCrunch display output in a more natural way.
As examples, StatCrunch provides predefined orderings for both the natural ordering of the days of week
(Sunday,Monday,Tuesday,... and Sun,Mon,Tue,...)
and the natural ordering of the months of the year
(January,February,March,... and Jan,Feb,Mar,...). When StatCrunch produces output with a set of group labels that
contains only the values defined in an ordering or some subset of them, the groups will be reordered in the output according
to their relative position in the ordering sequence rather than using an alphanumeric ordering which is otherwise standard.
Orderings can be added, deleted and modified using the Edit > Orderings menu option. When this option is selected,
a new dialog will appear with a listing of all the active orderings for the current StatCrunch session. To modify/remove a
particular ordering, select the ordering from the list and then press the Edit/Delete button. A new ordering can
be added by clicking the Add new ordering button. When modifying or adding an ordering, the distinct values should be
entered one per line in the resulting text field. StatCrunch ignores case when comparing ordering values to group labels
to determine whether or not to apply the ordering to specific output.
Contact Us
With questions or comments please submit a request via the tech support page.