Stata statistical software provides everything you need for data science and inference:

Data manipulation, exploration, visualisation, statistics, reporting, and reproducibility.

Linear models

regression • censored outcomes • endogenous regressors • bootstrap, jackknife, and robust and cluster–robust variance • wild cluster bootstrap • instrumental variables • three-stage least squares • constraints • quantile regression • GLS • DID • more

Time series

ARIMA • ARFIMA • ARCH/GARCH • VAR • VECM • multivariate GARCH • unobserved-components model • dynamic factors • state-space models • Markov-switching models • business calendars • tests for structural breaks • threshold regression • forecasts • impulse–response functions • local projections • unit-root tests • filters and smoothers • rolling and recursive estimation GLS • Bayesian • more

Data manipulation

data transformations • data frames • match-merge • import/export data • JDBC • ODBC • SQL • Unicode • by-group processing • append files • sort • row–column transposition • labelling • save results • more

Panel/longitudinal data

random and fixed effects with robust standard errors • linear mixed models • random-effects probit • GEE • random- and fixed-effects Poisson • dynamic panel-data models • instrumental variables • DID • panel unit-root tests • more

Survival analysis

Kaplan–Meier and Nelson–Aalen estimators, • Cox regression (frailty) • parametric models (frailty, random effects) • competing risks • hazards • time-varying covariates • left-, right-, and interval-censoring • Weibull, exponential, and Gompertz models • more

Reporting

reproducible reports • customisable tables • graphical tables builder • Word • Excel • PDF • HTML • dynamic documents • Markdown • Stata results and graphs • SVG • EPS • PNG • TIF • more

Multilevel mixed-effects models

continuous, binary, count, and survival outcomes • two-, three-, and higher-level models • generalized linear models • nonlinear models • random intercepts • random slopes • crossed random effects • BLUPs of effects and fitted values • hierarchical models • residual error structures • DDF adjustments • support for survey data • more

Bayesian analysis

thousands of built-in models • univariate and multivariate models • linear and nonlinear models • panel data • multilevel models • VAR • DGSE • continuous, binary, ordinal, and count outcomes • bayes: prefix for 58 estimation commands • continuous univariate, multivariate, and discrete priors • add your own models • multiple chains • convergence diagnostics • posterior summaries • hypothesis testing • model fit • model comparison • predictions • dynamic forecast • impulse-response functions • more

Bayesian model averaging

full enumeration • MC3 and MH sampling • three model prior classes • fixed and random g-priors for coefficients • heredity rules • PIP for predictor • model ranking by PMP • BMA convergence • variable-inclusion maps • model-size distribution plots • jointness measures • log predictive-score • predictions • more

Graphics

lines • bars • areas • ranges • contours • confidence intervals • interaction plots • survival plots • publication quality • customize anything • Graph Editor • more

Binary, count, and limited outcomes

logistic, probit, tobit • Poisson and negative binomial • conditional, multinomial, nested, ordered, rank-ordered, and stereotype logistic • multinomial probit • zero-inflated and left-truncated count models • selection models • marginal effects • more

Meta-analysis

effect sizes • common, fixed, and random effects • forest, funnel, and more plots • subgroup and cumulative analysis • leave-one-out • meta-regression • small-study effects • publication bias • multivariate • multilevel • more

Programming features

adding new commands • scripting • object-oriented programming • menu and dialog-box programming • dynamic documents • Markdown • Project Manager • Python integration • PyStata • Jupyter notebook • Java integration • Java plugins • H2O access • C/C++ plugins • more

Choice models

discrete choice • rank-ordered alternatives • conditional logit • multinomial probit • nested logit • mixed logit • panel data • case-specific and alternative-specific predictors • interpret results—expected probabilities, covariate effects, comparisons across alternatives • more

Power, precision, and sample size

power • sample size • effect size • minimum detectable effect • CI width • means • proportions • variances • correlations • ANOVA • regression • cluster randomized designs • case–control studies • cohort studies • contingency tables • survival analysis • balanced or unbalanced designs • results in tables or graphs • group sequential designs for clinical trials • more

Mata—Stata's serious programming language

interactive sessions • large-scale development projects • optimization • matrix inversions • decompositions • eigenvalues and eigenvectors • LAPACK engine • Intel® MKL • real and complex numbers • string matrices • interface to Stata datasets and matrices • numerical derivatives • object-oriented programming • more

Extended regression models (ERMs)

endogenous covariates • sample selection • nonrandom treatment • panel data • account for problems alone or in combination • continuous, interval-censored, binary, and ordinal outcomes • more

Causal inference / Treatment effects

inverse probability weight (IPW) • doubly robust methods • propensity-score matching • regression adjustment • covariate matching • DID • multilevel treatments • endogenous treatments • average treatment effects (ATEs) • ATEs on the treated (ATETs) • potential-outcome means (POMs) • continuous, binary, count, fractional, and survival outcomes • panel data • lasso • casual mediation analysis • more

Graphical user interface

menus and dialogs for all features • Data Editor • Variables Manager • Graph Editor • Project Manager • Do-file Editor • multiple preference sets • more

Generalized linear models (GLMs)

ten link functions • user-defined links • seven distributions • ML and IRLS estimation • nine variance estimators • seven residuals •more

Lasso

lasso • elastic net • model selection • prediction • inference • continuous, binary, and count outcomes • cross-validation • adaptive lasso • double selection • partialing out • cross-fit partialing out • double machine learning • endogenous covariates • treatment effects • more

Documentation

35 manuals • 18,000+ pages • seamless navigation • thousands of worked examples • quick starts • methods and formulas • references • more

Finite mixture models (FMMs)

fmm: prefix for 17 estimators • mixtures of a single estimator • mixtures combining multiple estimators or distributions • continuous, binary, count, ordinal, categorical, censored, truncated, and survival outcomes • more

SEM (structural equation modeling)

graphical path diagram builder • standardized and unstandardized estimates • modification indices • direct and indirect effects • continuous, binary, count, ordinal, and survival outcomes • multilevel models • random slopes and intercepts • factor scores, empirical Bayes, and other predictions • groups and tests of invariance • goodness of fit • handles MAR data by FIML • correlated data • survey data • more

Basic statistics

summaries • cross-tabulations • correlations • z and t tests • equality-of-variance tests • tests of proportions • confidence intervals • factor variables • more

Spatial autoregressive models

spatial lags of dependent variable, independent variables, and autoregressive errors • fixed and random effects in panel data • endogenous covariates • analyze spillover effects • more

Latent class analysis

binary, ordinal, continuous, count, categorical, fractional, and survival items • add covariates to model class membership • combine with SEM path models • expected class proportions • goodness of fit • predictions of class membership • more

Nonparametric methods

nonparametric regression • Wilcoxon–Mann–Whitney, Wilcoxon signed ranks, and Kruskal–Wallis tests • Cochran-Armitage and other trend tests • Spearman and Kendall correlations • Kolmogorov–Smirnov tests • exact binomial CIs • survival data • ROC analysis • smoothing • bootstrapping • more

ANOVA/MANOVA

balanced and unbalanced designs • factorial, nested, and mixed designs • repeated measures • marginal means • contrasts • more

Multiple imputation

nine univariate imputation methods • multivariate normal imputation • chained equations • explore pattern of missingness • manage imputed datasets • fit model and pool results • transform parameters • joint tests of parameter estimates • predictions • more

Nonlinear regression, GMM and other systems of equations

generalized method of moments (GMM) • nonlinear regression • demand systems • more

Exact statistics

exact logistic and Poisson regression • exact case–control statistics • binomial tests • Fisher’s exact test for r × c tables • more

Survey methods

multistage designs • bootstrap, BRR, jackknife, linearized, and SDR variance estimation • poststratification • raking • calibration • DEFF • predictive margins • means, proportions, ratios, totals • summary tables • almost all estimators supported • more

Simple maximum likelihood

specify likelihood using simple expressions • no programming required • survey data • standard, robust, bootstrap, and jackknife SEs • matrix estimators • more

Epidemiology

standardization of rates • case–control • cohort • matched case–control • Mantel–Haenszel • pharmacokinetics • ROC analysis • ICD-10 • additive models of risk • more

Cluster analysis

hierarchical clustering • kmeans and kmedian nonhierarchical clustering • dendrograms • stopping rules • user-extensible analyses • more

Network analysis

nwcommands: import and manipulate networks • generate networks • calculate centrality and dissimilarity measures • visualise networks • more

Programmable maximum likelihood

user-specified functions • NR, DFP, BFGS, BHHH • OIM, OPG, robust, bootstrap, and jackknife SEs • Wald tests • survey data • numeric or analytic derivatives • more

Survey methods

multistage designs • bootstrap, BRR, jackknife, linearized, and SDR variance estimation • poststratification • raking • calibration • DEFF • predictive margins • means, proportions, ratios, totals • summary tables • almost all estimators supported • more

Simple maximum likelihood

specify likelihood using simple expressions • no programming required • survey data • standard, robust, bootstrap, and jackknife SEs • matrix estimators • more

DSGE models

specify models algebraically • solve models • estimate parameters • identification diagnostics • policy and transition matrices • IRFs • dynamic forecasts • Bayesian • more

IRT (item response theory)

binary (1PL, 2PL, 3PL), ordinal, and categorical response models • item characteristic curves • test characteristic curves • item information functions • test information functions • multiple-group models • differential item functioning (DIF) • more

Other statistical methods

kappa measure of interrater agreement • Cronbach's alpha • stepwise regression • tests of normality • more

Tests, predictions, and effects

Wald tests • LR tests • linear and nonlinear combinations • predictions and generalized predictions • marginal means • least-squares means • adjusted means • marginal and partial effects • forecast models • Hausman tests • more

Multivariate methods

factor analysis • principal components • discriminant analysis • rotation • multidimensional scaling • Procrustean analysis • correspondence analysis • biplots • dendrograms • user-extensible analyses •more

Functions

statistical • random-number • mathematical • string • date and time • regular expressions • Unicode • more

Contrasts, pairwise comparisons, and margins

compare means, intercepts, or slopes • compare with reference category, adjacent category, grand mean, etc. • orthogonal polynomials • multiple-comparison adjustments • graph estimated means and contrasts • interaction plots • more

Internet capabilities

search and download thousands of community contributed features • web updating • web file sharing • latest Stata news • more

Resampling and simulation methods

bootstrap • jackknife • Monte Carlo simulation • permutation tests • exact p-values • more

Community-contributed commands

search and download thousands of free additions • discover new features in the Stata Journal • share commands by posting to the SSC • discuss community-contributed commands on Statalist • more

Installation Qualification

IQ report for regulatory agencies such as the FDA • installation verification • more

FDA Compliance

Adherence to FDA regulatory requirement for statistical software • more

Accessibility

Section 508 compliance, accessibility for persons with disabilities • more

New in Stata 18

Bayesian model averaging • causal mediation analysis • tables of descriptive statistics • heterogeneous DID • group sequential designs • multilevel meta-analysis • meta-analysis for prevalence • robust inference for linear models • wild cluster bootstrap • more