Qtiplot Pro
Comparison of data analysis packages R, Matlab, Sci. Py, Excel, SAS, SPSS, Stata. Lukas and I were trying to write a succinct comparison of the most popular packages that are typically used for data analysis. I think most people choose one based on what people around them use or what they learn in school, so Ive found it hard to find comparative information. Im posting the table here in hopes of useful comments. Name. Advantages. Qtiplot Project' title='Qtiplot Project' />Disadvantages. Open source Typical users. RLibrary support visualization. Steep learning curve. Yes. Finance Statistics. Matlab. Elegant matrix support visualization. Expensive incomplete statistics support. No. Engineering. Sci. PyNum. PyMatplotlib. Python general purpose programming languageImmature. Yes. Engineering. Excel. Easy visual flexible. Large datasets. No. Qtiplot Pro' title='Qtiplot Pro' />Business. SASLarge datasets. Expensive outdated programming language. No. Business Government. Stata. Easy statistical analysis. No. Science. SPSSLike Stata but more expensive and worse70. SAS, SPSS, and Stata. Theres a bunch more to be said for every cell. Among other things Two big divisions on the table The more programming oriented solutions are R, Matlab, and Python. Qtiplot Prometheus' title='Qtiplot Prometheus' />More analytic solutions are Excel, SAS, Stata, and SPSS. Python immature matplotlib, numpy, and scipy are all separate libraries that dont always get along. Why does matplotlib come with pylab which is supposed to be a unified namespace for everything Isnt scipy supposed to do that Why is there duplication between numpy and scipy e. And then theres package compatibility version hell. You can use SAGE or Enthought but neither is standard yet. In terms of functionality and approach, Sci. Py is closest to Matlab, but it feels much less mature. Matlabs language is certainly weak. Patch American Conquest Edizione Oro there. It sometimes doesnt seem to be much more than a scripting language wrapping the matrix libraries. Python is clearly better on most counts. Rs is surprisingly good Scheme derived, smart use of named args, etc. Everyone says SAS is very bad. Matlab is the best for developing new mathematical algorithms. Very popular in machine learning. Qtiplot PromIve never used the Matlab Statistical Toolbox. Im wondering, how good is it compared to R Heres an interesting reddit thread on SASStata vs R. Filter your traffic, scan for vulnerabilities, patch and update important thirdparty software using this straightforward and reliable software solution. Try QtiPlot now analyse faster, publish more QtiPlot is a cross platform data analysis and scientific visualisation solution. Thanks to its price defying all. SPSS and Stata in the same category they seem to have a similar role so we threw them together. Stata is a lot cheaper than SPSS, people usually seem to like it, and it seems popular for introductory courses. I personally havent used eitherSPSS and Stata for Science weve seen biologists and social scientists use lots of Stata and SPSS. My impression is they get used by people who want the easiest way possible to do the sort of standard statistical analyses that are very orthodox in many academic disciplines. ANOVA, multiple regressions, t and chi squared significance tests, etc. Certain types of scientists, like physicists, computer scientists, and statisticians, often do weirder stuff that doesnt fit into these traditional methods. Another important thing about SAS, from my perspective at least, is that its used mostly by an older crowd. I know dozens of people under 3. SAS. At that R meetup last week, Jim Porzak asked the audience if there were any recent grad students who had learned R in school. Many hands went up. How To Apps On Sharp Smart Tv on this page. Then he asked if SAS was even offered as an option. All hands went down. There were boatloads of SAS representatives at that conference and they sure didnt seem to be on the leading edge. But is there ANY package besides SAS that can do analysis for datasets that dont fit into memory That is, ones that mostly have to stay on disk And exactly how good as SASs capabilities here anyway If your dataset cant fit on a single hard drive and you need a cluster, none of the above will work. There are a few multi machine data processing frameworks that are somewhat standard e. Hadoop, MPI but Its an open question what the standard distributed data analysis framework will be. Hive Pig Or quite possibly something else. This was an interesting point at the R meetup. Porzak was talking about how going to My. Qtiplot Proflowers' title='Qtiplot Proflowers' />SQL gets around Rs in memory limitations. But Itamar Rosenn and Bo Cowgill Facebook and Google respectively were talking about multi machine datasets that require cluster computation that R doesnt come close to touching, at least right now. Its just a whole different ballgame with that large a dataset. SAS people complain about poor graphing capabilities. R vs. Matlab visualization support is controversial. One view Ive heard is, Rs visualizations are great for exploratory analysis, but you want something else for very high quality graphs. Matlabs interactive plots are super nice though. Matplotlib follows the Matlab model, which is fine, but is uglier than either IMO. Excel has a far, far larger user base than any of these other options. Thats important to know. I think its underrated by computer scientist sort of people. But it does massively break down at 1. Another option Fortran and CC. They are super fast and memory efficient, but tricky and error prone to code, have to spend lots of time mucking around with IO, and have zero visualization and data management support. Most of the packages listed above run Fortran numeric libraries for the heavy lifting. Another option Mathematica. I get the impression its more for theoretical math, not data analysis. Can anyone prove me wrong Another option the pre baked data mining packages. The open source ones I know of are Weka and Orange. I hear there are zillions of commercial ones too. Jerome Friedman, a big statistical learning guy, has an interesting complaint that they should focus more on traditional things like significance tests and experimental design. Here the article that inspired this rant. I think knowing where the typical users come from is very informative for what you can expect to see in the softwares capabilities and user community. Id love more information on this for all these options. What do people think Aug 2. Serbo Croatian translation. Apr 2. 01. 5 update Slovenian translation. May 2. 01. 7 update Portugese translation.