Stat Help

Summary Statistics

Columns

Provides the following descriptive statistics in tabular format for the column(s) selected: sample size (n), mean, variance, standard deviation (Std. Dev.), Standard Error (Std. Err.), median, range, minimum, maximum, first quartile (Q1) and third quartile (Q3).
  1. Select the columns for which summary statistics will be computed.
  2. Enter an optional Where clause to specify the data rows to be included in the computation.
  3. Select an optional Group By column to group results. If a Group By column is selected, choose whether to display the output in separate tables for each column selected or in separate tables for each group.
  4. Click the Next button to select the summary statistics to be computed. The statistics will be displayed in the order in which they are selected (from right to left). Additional percentiles may also be entered as a space or comma delimited list.
  5. Check the Store output in data table option if the output is to be placed in the data table.
  6. Click the Calculate button to view the results.

Rows

Provides the following summary statistics for each row in the data table for the columns selected: count, sum, mean, variance, standard deviation, minimum, median and maximum.
  1. Select the columns for which summary statistics will be computed.
  2. Enter an optional Where clause to specify the data rows to be included in the computation.
  3. Click the Next button to select the summary statistics (by default, all are selected) to be computed. The statistics will be displayed in the order in which they are selected (from right to left).
  4. Check the Store output in data table option if the output is to be placed in the data table.
  5. Click the Calculate button to view the results.

Correlation

Computes the Pearson correlation between two columns or the corresponding correlation matrix if three or more columns are selected.
  1. Select the columns to be used in the computation.
  2. Enter an optional Where clause to specify the data rows to be included in the computation.
  3. Select an optional Group By column to group results. A separate result (matrix) will be computed for each distinct value of the Group By column.
  4. Click Next to alter the format of the correlation matrix.
  5. Select options under Display in the correlation matrix to add additional values to the correlation matrix.
  6. By default all selected columns will displayed in the correlation matrix. To only display specific columns, choose the Selected option under Display columns and specify those to be included. The specified columns will also be displayed in the order in which they are selected.
  7. Under Sort rows by correlation with, a column for sorting the correlation matrix in either ascending or descending order may be specified. The column that is selected for sorting will be the first one displayed in the matrix.
  8. Click the Calculate button to view the results.

Covariance

Computes the covariance between two columns or the lower half of the covariance matrix if three or more columns are selected.
  1. Select the columns to be used in the computation.
  2. Enter an optional Where clause to specify the data rows to be included in the computation.
  3. Select an optional Group By column to group results. A separate result (matrix) will be computed for each distinct value of the Group By column.
  4. Click the Calculate button to view the results.

Grouped/Binned data

Provides descriptive statistics in tabular format for data in binned format consisting of bin values and associated counts.
  1. Select the column containing the bins for which summary statistics will be computed. Bin values must use "to" or "-" as a delimter, e.g. "10 to 20 " or "10 - 20 ".
  2. Enter an optional column for the counts associated with each bin. If this column is omitted, each bin will have a count of 1.
  3. Enter an optional Where clause to specify the data rows to be included in the computation.
  4. Click the Next button to select the summary statistics to be computed. The statistics will be displayed in the order in which they are selected (from left to right). Additional percentiles may also be entered as a space or comma delimited list.
  5. Check the Store output in data table option if the output is to be placed in the data table.
  6. Click the Calculate button to view the results.

Tables

Frequency

Provides the frequency and relative frequency for each unique value within selected columns.
  1. Select the columns to be used in the computation.
  2. Enter an optional Where clause to specify the data rows to be included in the computation.
  3. Click the Calculate button to view the results.

Contingency

Computes a two-way frequency table for distinct values in two separate columns and provides a test for independence. The table can be generated using raw data (default) or summary data. To use summary data click on the Use summary data! link at the top of the dialog page.

Outcome Table

Displays a table highlighting delimited outcomes in selected columns that contain unstructured lists of outcomes. An outcomes table goes through a column row by row and finds the number of unique items across all of the lists in the whole column. The number of rows, which match each individual outcome, are tabulated and displayed both as a count and as a horizontal bar of a scaled width so that the counts for individual items can be easily compared. See the StatCrunch Friend Data application at http://www.statcrunch.com/frienddata/ for numerous examples of outcome tables.
  1. Select the columns containing the outcomes of interest.
  2. Enter an optional Where clause to specify the data rows to be included.
  3. Select an optional Group by column to group results. By default if a Group by column is specified, a separate table will be produced for each unique value of the Group by column.
  4. Specify the delimiter to be used to separate outcomes. A comma is the default delimiter.
  5. Specify whether or not to limit an outcome to one occurrence per cell. This option can be used to avoid duplication of outcomes for experimental units.
  6. Click the Next button to specify additional options.
  7. Choose the items (counts and/or percentages) to be tabled for each outcome.
  8. Specify the maximum bar width in pixels. The default value is 500.
  9. Choose the ordering for the outcomes to be tabled. By default the outcomes will be tabled in descending order in terms of their frequencies.
  10. Click the Next button to specify additional options.
  11. Specify whether or not to remove common words and/or punctuation from the outcomes table. These options are not checked by default. They are most useful when the outcomes being tabled are individual words. StatCrunch has a long list of common words shown below and in the dialog window. Words can be deleted from the common words list by removing them from the text area, and additional words can be added to the list by entering them in the text area.

    Common words list:
    & a able about above abroad according accordingly across actually adj after afterwards again against ago ahead 
    ain't all allow allows almost alone along alongside already also although always am amid amidst among amongst 
    an and another any anybody anyhow anyone anything anyway anyways anywhere apart appear appreciate appropriate 
    are aren't around as a's aside ask asking associated at available away awfully b back backward backwards be became 
    because become becomes becoming been before beforehand begin behind being believe below beside besides best better 
    between beyond both brief but by c came can cannot cant can't caption cause causes certain certainly changes 
    clearly c'mon co co. com come comes concerning consequently consider considering contain containing contains 
    corresponding could couldn't course c's currently d dare daren't definitely described despite did didn't different 
    directly do does doesn't doing done don't down downwards during e each edu eg eight eighty either else elsewhere 
    end ending enough entirely especially et etc even ever evermore every everybody everyone everything everywhere ex 
    exactly example except f fairly far farther few fewer fifth first five followed following follows for forever former 
    formerly forth forward found four from further furthermore g get gets getting given gives go goes going gone got gotten 
    greetings h had hadn't half happens hardly has hasn't have haven't having he he'd he'll hello help hence her here hereafter 
    hereby herein here's hereupon hers herself he's hi him himself his hither hopefully how howbeit however hundred i 
    i'd ie if ignored i'll i'm immediate in inasmuch inc inc. indeed indicate indicated indicates inner inside insofar 
    instead into inward is isn't it it'd it'll its it's itself i've j just k keep keeps kept know known knows l last lately 
    later latter latterly least less lest let let's like liked likely likewise little look looking looks low lower ltd m 
    made mainly make makes many may maybe mayn't me mean meantime meanwhile merely might mightn't mine minus miss more 
    moreover most mostly mr mrs much must mustn't my myself n name namely nd near nearly necessary need needn't needs neither 
    never neverless nevertheless new next nine ninety no nobody non none nonetheless noone no-one nor normally not 
    nothing notwithstanding novel now nowhere o obviously of off often oh ok okay old on once one ones one's only onto 
    opposite or other others otherwise ought oughtn't our ours ourselves out outside over overall own p particular particularly 
    past per perhaps placed please plus possible presumably probably provided provides q que quite qv r rather rd re really 
    reasonably recent recently regarding regardless regards relatively respectively right round s said same saw say saying 
    says second secondly see seeing seem seemed seeming seems seen self selves sensible sent serious seriously seven several 
    shall shan't she she'd she'll she's should shouldn't since six so some somebody someday somehow someone something sometime 
    sometimes somewhat somewhere soon sorry specified specify specifying still sub such sup sure t take taken taking tell 
    tends th than thank thanks thanx that that'll thats that's that've the their theirs them themselves then thence there 
    thereafter thereby there'd therefore therein there'll there're theres there's thereupon there've these they they'd 
    they'll they're they've thing things think third thirty this thorough thoroughly those though three through throughout 
    thru thus till to together too took toward towards tried tries truly try trying t's twice two u un under underneath 
    undoing unfortunately unless unlike unlikely until unto up upon upwards us use used useful uses using usually v value 
    various versus very via viz vs w want wants was wasn't way we we'd welcome well we'll went were we're weren't we've what 
    whatever what'll what's what've when whence whenever where whereafter whereas whereby wherein where's whereupon wherever 
    whether which whichever while whilst whither who who'd whoever whole who'll whom whomever who's whose why will willing 
    wish with within without wonder won't would wouldn't x y yes yet you you'd you'll your you're yours yourself yourselves 
    you've z zero
    

Z Statistics

One Sample

Provides hypothesis tests and confidence intervals for a population mean based on a single sample when the population variance is known.

Two Sample

Provides hypothesis tests and confidence intervals for the difference (first sample minus second sample) in two means from independent samples.

Proportions

One Sample

Provides hypothesis tests and confidence intervals for the proportion of successes in one sample of trials. The procedure used is a Z test using the normal approximation to the binomial. The procedure can be used with raw data (default) or summary data. To use summary data click on the Use summary data! link at the top of the dialog page.

Two Sample

Provides confidence intervals and/or hypothesis tests for the difference (first sample minus second sample) in the proportion of successes using data from two independent samples. The procedure used is a Z test using the normal approximation to the binomial. The procedure can be used with raw data (default) or summary data. To use summary data click on the Use summary data! link at the top of the dialog page.

T Statistics

One Sample

Provides confidence intervals and/or hypothesis tests for a population mean based on a single sample when the population variance is not known.

Two Sample

Provides hypothesis tests and confidence intervals for the difference (first sample minus second sample) in two means from independent samples.

Paired

Provides hypothesis tests and confidence intervals for a difference in population means with paired data. Pairwise differences for values in selected columns (first minus second) serve as the basis for the computation
  1. Select the column containing the first sample.
  2. Select the column containing the second sample.
  3. Enter an optional Where clause to specify the data rows to be included in the computation.
  4. Select an optional Group By column to group results. A separate hypothesis test/confidence interval will be computed for each distinct value of the Group By column. Resulting P-values are not adjusted for multiple comparisons.
  5. Check the "Save differences" option to save the differences to the data table.
  6. Click the Next button to select between a hypothesis test and confidence interval computation. The hypothesis test option is selected by default using a null mean difference value of zero and a not equal alternative hypothesis.
    For a hypothesis test: For a confidence interval
  7. Click the Calculate button to view the results. The output will consist of a table containing the sample mean difference (Sample Mean), the standard error of the difference in sample means (Std. Err.), the degrees of freedom (DF), the T statistic (T-stat) and the P-value for the test.

Variance

One Sample

Provides hypothesis tests and confidence intervals for a population variance based on a single sample when the data come from a normal distribution.

Two Sample

Provides confidence intervals and hypothesis tests for the ratio of two population variances (first sample / second sample) when the samples come from two independent normal distributions.

Regression

Simple Linear Regression

Provides routines for fitting the simple linear regression model.
  1. Select the X variable (independent variable) for the regression.
  2. Select the Y variable (dependent variable) for the regression.
  3. Enter an optional Where clause to specify the data rows to be included in the computation.
  4. Select an optional Group By column to group results. A separate regression analysis will be computed for each distinct value of the Group By column. Resulting P-values are not adjusted for multiple comparisons.
  5. Click the Next button to select between a hypothesis test and a confidence interval computation for the regression parameters. By default the results of a two-sided hypothesis test with a null value of zero is performed for each parameter.
    For hypothesis tests: For confidence intervals:
  6. Click the Next button for the following options:
  7. Click the Next button again to select from a variety of diagnostic plots.
  8. Click the Next button again to specify graph layout options.
  9. Click the Calculate button to view the results.

Polynomial Regression

Provides routines for fitting a regression model of any degree from one up to six.
  1. Select the order of Polynomial Regression (default is 2).
  2. Select the X variable (independent variable) for the regression.
  3. Select the Y variable (dependent variable) for the regression.
  4. Enter an optional Where clause to specify the data rows to be included in the computation.
  5. Select an optional Group By column to group results. A separate regression analysis will be computed for each distinct value of the Group By column. Resulting P-values are not adjusted for multiple comparisons.
  6. Select the optional No Intercept feature to compute all results without an intercept in the model.
  7. Click the Next button to select between a hypothesis test and a confidence interval computation for the regression parameters. By default the results of a two-sided hypothesis test with a null value of zero is performed for each parameter.
    For hypothesis tests: For confidence intervals:
  8. Click the Next button for the following options:
  9. Click the Next button again to select from a variety of diagnostic plots.
  10. Click the Next button again to specify graph layout options.
  11. Click the Calculate button to view the results.

Multiple Linear Regression

Provides routines for fitting a multiple linear regression model.
  1. In the first panel of options, specify the variable to be included in the regression model as discussed below. Note that only columns with at least one numeric column are available for selection in most input fields.
  2. Click the Next button for the following variable selection options in the second dialog panel.
  3. Click the Next button for the following save options in the third dialog panel.
  4. Click the Calculate button to view the regression results.

Logistic Regression

Provides routines for fitting a regression model where the dependent variable is binary.

ANOVA

One Way

Provides for testing the equality of several population means using independent samples from each population.
  1. Depending on the format of the data, select one of the following options:
  2. Enter an optional Where clause to specify the data rows to be included in the computation.
  3. Click the Calculate button to view the results. the output will consist of a table of information about the sample means and the ANOVA table.
  4. Select the Tukey HSD option and specify a confidence level if you wish perform a pairwise means analysis. This option will compute confidence intervals for each mean difference adjusted for multiplicity.

Two Way

Provides for testing the equality of several population means where the populations are stratified across two factors (row and column). This procedure in StatCrunch is restricted to an equal number of samples for each factor pairing.
  1. Select the column which contains the sample responses.
  2. Select the column which contians the values of the row factor.
  3. Select the column which contians the values of the column factor.
  4. Enter an optional Where clause to specify the data rows to be included in the computation.
  5. Click the Next button for the following options:
  6. Click the Next button again to specify graph layout options.
  7. Click the Calculate button to view the results. the output will consist of a table of information about the sample means and the ANOVA table.

Nonparametrics

Sign Test

Provides hypothesis tests and confidence intervals for a population mean based on a single sample.
  1. Select the column containing the sample values for the calculation(s). If more than one column is selected, a separate test will be done for each column. Resulting P-values are not adjusted for multiple comparisons.
  2. Enter an optional Where clause to specify the data rows to be included in the computation.
  3. Select an optional Group By column to group results. A separate hypothesis test/confidence interval will be computed for each distinct value of the Group By column. Resulting P-values are not adjusted for multiple comparisons.
  4. Click the Next button to select between a hypothesis test and confidence interval computation. The hypothesis test option is selected by default using a null median value of zero and a not equal alternative hypothesis.
    For a hypothesis test: For a confidence interval
  5. Click the Calculate button to view the results. The hypothesis test output will consist of a table containing the number of observations (n), the number used for the test (n for test), the sample median, the number of values below the hypothesized median (Below), the number of values equal to the hypothesized median (Equal), the number of values above the hypothesized median (Above) and the P-value for the test. For a confidence interval, the output consists of the number of observations (n), the sample median, the achieved confidence level and the lower and upper limits on the interval.

Wilcoxon Signed Ranks

Provides hypothesis tests and confidence intervals for a population mean based on a single sample using signed ranks.
  1. Select the column containing the sample values for the calculation(s). If more than one column is selected, a separate test will be done for each column. Resulting P-values are not adjusted for multiple comparisons.
  2. Enter an optional Where clause to specify the data rows to be included in the computation.
  3. Select an optional Group By column to group results. A separate hypothesis test/confidence interval will be computed for each distinct value of the Group By column. Resulting P-values are not adjusted for multiple comparisons.
  4. Click the Next button to select between a hypothesis test and confidence interval computation. The hypothesis test option is selected by default using a null median value of zero and a not equal alternative hypothesis.
    For a hypothesis test: For a confidence interval
  5. Click the Calculate button to view the results. The hypothesis test output will consist of a table containing the number of observations (n), the number used for the test (n for test), the estimated median using Walsh averages, the Wilcoxon statistic, the P-value for the test and the method used to compute the P-value. For a confidence interval, the output consists of the number of observations (n), the estimated median using Walsh averages, the achieved confidence level and the lower and upper limits on the interval.

Mann-Whitney

Provides hypothesis tests and confidence intervals for comparing two population medians using sample ranks.
  1. Select the column containing the first sample.
  2. Enter an optional WHERE clause to specify the data rows to be included in the first sample.
  3. Select the column containing the second sample. This column can be the same column containing the first sample.
  4. Enter an optional WHERE clause to specify the data rows to be included in the second sample.
  5. Click the Next button to select between a hypothesis test and confidence interval computation. The hypothesis test option is selected by default using a null median difference of one and a not equal alternative hypothesis.
    For a hypothesis test: For a confidence interval
  6. Click the Calculate button to view the results. The output will consist of a table containing the number of observations in the first sample (n1), the number of observations in the second sample (n2), the estimated difference between the medians, the Mann-Whitney statistic (Test Stat.), the P-value for the test and the method used to compute the P-value.

Kruskal-Wallis

Provides hypothesis tests and confidence intervals for comparing two or more population medians using sample ranks.
  1. Depending on the format of the data, select one of the following options:
  2. Enter an optional Where clause to specify the data rows to be included in the computation.
  3. Click the Calculate button to view the results. The output will consist of a table of information about the sample ranks and the results of the test of the hypothesis that all the medians are equal. Please note that this test is only valid for large samples.

Chi-square goodness of fit test

Provides a chi-square goodness of fit test.
  1. Select the column contianing the observed values.
  2. Select the column containing the expected values.
  3. Enter an optional Where clause to specify the data rows to be included in the computation.
  4. Select an optional Group By column to group results. A separate hypothesis test will be computed for each distinct value of the Group By column. Resulting P-values are not adjusted for multiple comparisons.
  5. Click the Calculate button to view the results.

Control Charts

X-bar

Displays an X-bar chart for monitoring the mean of a process using samples from the process.
  1. Depending on the format of the data, select one of the following options:
  2. Enter an optional Where clause to specify the data rows to be included in the computation.
  3. Click the Next button for the following options:
  4. Click the Calculate button to view the results.

R

Displays an R chart for monitoring the variability of a process using samples from the process.
  1. Depending on the format of the data, select one of the following options:
  2. Enter an optional Where clause to specify the data rows to be included in the computation.
  3. Click the Next button for the following options:
  4. Click the Calculate button to view the results.

X-bar, R

Displays a stacked X-bar and R chart. See references to the X-bar and R charts for help on these items.

np Chart

Displays an np chart for monitoring the number of defectives produced by a process using samples from the process.
  1. Select the column containing the number of defectives in each sample.
  2. Depending on the format of the data, select one of the following options:
  3. Enter an optional Where clause to specify the data rows to be included in the computation.
  4. Click the Next button for the following options:
  5. Click the Calculate button to view the results.

p Chart, np Chart

Displays a p chart for monitoring the proportion of defectives produced by a process using samples from the process.
  1. Select the column containing the number of defectives in each sample.
  2. Depending on the format of the data, select one of the following options:
  3. Enter an optional Where clause to specify the data rows to be included in the computation.
  4. Click the Next button for the following options:
  5. Click the Calculate button to view the results.

c Chart

Displays a c chart for monitoring the number of defectives per sample.
  1. Select the column containing the number of defectives in each sample.
  2. Enter an optional Where clause to specify the data rows to be included in the computation.
  3. Click the Next button for the following options:
  4. Click the Calculate button to view the results.

u Chart

Displays a u chart for monitoring the number of defectives per unit.
  1. Select the column containing the number of defectives in each sample.
  2. Depending on the format of the data, select one of the following options:
  3. Enter an optional Where clause to specify the data rows to be included in the computation.
  4. Click the Next button for the following options:
  5. Click the Calculate button to view the results.

Calculators

StatCrunch has graphical calculators that can be used to compute probabilities for the distributions listed below:

Resample

Resampling capabilities have recently been added to StatCrunch. These capabilities can be used to perform bootstrap and permutation methods for confidence intervals and hypothesis tests.
  1. Select the columns to resample.
  2. Enter an expression for the Statistic to be computed for each resample. Examples of common forms for expressions in this case are mean("Age"), median("Age") and sum("Gender"="F")/10 where Age and Gender are columns to be resampled from the underlying data set. The expression may include columns that are being resampled along with those that are not.
  3. Select the resampling method. To bootstrap, select the with replacement option. To shuffle or permute, select the without replacement option.
  4. Select the type of resampling. A univariate type will resample from each selected column independently one at a time. A multivariate type will sample each selected column at the same row.
  5. Specify the number of resamples (by default 1000).
  6. Click the Next button for the following options:
  7. Click Resample Statistic to collect the resamples and to produce summary output.