Excerpt
Common statistics program packages differ considerably in terms of their strengths, weaknesses, and handling. The decision as to which system is the best fit should be made with care. Changing to a new system can result in high costs for things like new licenses and re-training. This article introduces and contrasts the market leaders – R, Python, SAS, SPSS, and STATA – to help illustrate their relative pros and cons, and help make the decision a bit easier.
R
R is a popular, open-source statistics environment that can be extended by packages almost at will. R is commonly used with RStudio, a comfortable development environment that can be used locally or in a client-server installation via a web browser. R applications can also be used directly and interactively on the web via Shiny.
Strengths
- Very large range of functions (well over 2,000 packages)
- New statistical methods are quickly implemented
- Very easy to automate and integrate (for example, with Git, LaTeX, ODBC, Oracle R Enterprise, teradataR, Apache Hadoop, Microstrategy, etc.)
- Very good community support, as well as fee-based support via third-party providers
- Extensive help resources freely available (manuals, tutorials, and so on)
- Very powerful and flexible scripting language (e.g. support of object-oriented programming)
- All common platforms are supported (Windows, Linux, MacOS…)
- Future-proof due to very large, active developer community
Weaknesses
- Getting familiar with the R syntax presents a barrier to entry
- Stability/quality of little-used packages is often not as high as the core distribution
- Powerful hardware is required when working with very large data sets
Licensing model and cost
R is free and open source: there are no fees for use
Conclusion
Originally, R was only a low-cost alternative for those that could not afford a commercial statistics program. R has outgrown this perception and now trumps the commercial competition in terms of functionality, flexibility, and integrability with other applications. Many competitors (e.g., SPSS) have reacted by integrating R into their programs. The criticism that R is much harder to learn and use than commercial competitors is less valid today with the availability of RStudio. R is a particularly good choice for frequent users that plan to deal more extensively with statistics and don’t want to be restricted by their statistical program.
Python
Python is a fully functional, open, interpreted programming language that has become an equal alternative for data science projects in recent years. Python is particularly well-suited to the Deep Learning and Machine Learning fields, and is also practical as statistics software through the use of packages, which can easily be installed. A variety of development environments are available, such as jupyter, spyder, and PyCharm. Python is a widely-used language that is also popular in fields like web development.
Strengths
- Powerful, fully-functional programming language
- Offers the potential for object-oriented, structured, and functional concepts
- Mature programming language, resulting in unit tests and debugging functionalities, for example
- A large number of stable packages in the data science sector and beyond
- Readable, clean syntax
- Constant development by a large developer community
- Full availability of the latest Deep Learning and Machine Learning methods
- Very easy to automate (e.g., via scripts or a web server)
- Fully integratable (Git, teradata, PySpark, Hadoop, KNIME)
- Extremely good community support from a large and constantly-growing community
- Visualizations are appealing and easy to create
- Professional development environments are available
- Future-proof due to continued growth in use in scientific and commercial fields
Weaknesses
- Not all statistical methods are available
- Some development environments for statistics are still in their infancy
- High bar of entry due to being a “full” programming language
Licensing model and cost
There are no user fees for the use of Python. However, in some special areas (e.g., text mining) not all packages are released for commercial use.
Conclusion
Python stands out in this summary given that it is a complete programming language suitable for a wide range of applications. In recent years it has also developed into a serious statistics program due to a large number of high-performance packages and is increasing in popularity. In particular, Python is indispensable for procedures that are more likely to come from the field of computer science, such as Deep Learning. Its advantages are also clear for automation, and in interaction with other programs (which can also be written in Python). Learning Python requires being prepared to learn a complete programming language, though many good tutorials and training are available on the subject due to the language’s popularity. A development environment specifically tailored to the data science sector on the level of RStudio, for example, does not (yet) exist.
See the comparison for SAS, SPSS and STATA on the INWT Statistics website.
“Originally Posted on July 25, 2019 – What’s the Best Statistical Software? A Comparison of R, Python, SAS, SPSS and STATA”
Disclosure: Interactive Brokers
Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.
This material is from INWT Statistics and is being posted with its permission. The views expressed in this material are solely those of the author and/or INWT Statistics and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.
Join The Conversation
If you have a general question, it may already be covered in our FAQs. If you have an account-specific question or concern, please reach out to Client Services.