Data Mining: R, Python, RapidMiner or Matlab

- - posted in Architecture - tagged by R, data mining, matlab, python, scipy | Comments

In work or study, we have more and more problmes, which can be solved by data mining.

Refer to data mining tools, kinds of stuff some out,such as R, spss, mathematica,etc. The most popular packages in the industry are SAS and SPSS, but they are quite expensive, so you might wanna have a free one, and better a open source one.

  • R
  • RapidMiner
  • python
  • Matlab

Here is a poll from KDnuggets

KDnuggets Annual Software Poll: Data Mining

1,Python

  • script language and interactive
  • open source and lots of lib, such as NLTK, scipy, scikit-learn, etc.
  • Good for mathematics.
  • Limited list of machine learning algorithms.
  • Machine learning is not handled uniformly between the different libraries.
  • efficiency in training model is little low, comparing C language
  • easier to integrate a work flow with other program, because other work may be done by python
  • windows/linux/osx/android,etc

2,R

  • Open source
  • interactive
  • learning curve. Language is pretty different from current mainstream languages like C, C#, C++, Java, PHP and VB.
  • Very extensive statistical library.
  • Very concise for solving statistical problems.
  • a powerful elegant array language in the tradition of APL, Mathematica and MATLAB, but also LISP/Scheme.
  • easy to integrate this into a work flow with your other programs. You just spawn an R program and pass input in and read output from a pipe.
  • R was created in 1990.
  • Less specialized towards data mining
  • windows/linuxs/osx

Link: Screencast showing how a trained R user can generate a PMML neural network model in 60 seconds.

3,RapidMiner

  • an open source statistical and data mining package
  • written in Java, easy to transplant across multi-operating system
  • Lot of data mining algorithms.
  • Good graphics
  • It easily reads and writes Excel files and different databases.
  • program by piping components together in a graphic ETL workflows
  • If you set up an illegal workflows RapidMiner suggest Quick Fixes to make it legal
  • hard to learn, however good video tutorials
  • can work with R & SPSS

4,Matlab

  • good at matrix & plot
  • easy program, language grama is like c
  • too huge ,too expensive

Reference:

1,http://blog.samibadawi.com/2010/06/orange-r-rapidminer-statistica-and-weka.html
2,http://blog.samibadawi.com/2010/04/r-rapidminer-statistica-ssas-or-weka.html
3,http://orange.biolab.si/
4,http://rapid-i.com/
5,http://cos.name/
6,http://r-ke.info/
7,http://www.kdnuggets.com/2013/06/kdnuggets-annual-software-poll-rapidminer-r-vie-for-first-place.html

Comments