Intro Primer For WEKA Machine Learning Software
1 Intro Primer For WEKA Machine Learning Software
2.0.4 Attribute Selection
2.0.6 Time Series Forecasting
3.0.1 1. Download and Install
3.0.3 3. Weka Experimenter
Intro Primer For WEKA Machine Learning Software
(3 votes, average: 5.00 out of 5)
Weka is a machine learning software and data mining workbench. Itās an acronym for the Waikato Environment for KnowledgeĀ Analysis. It contains a collection of visualization tools and algorithms for data analysis andĀ predictive modeling. It is a very convenient tool with wonderful graphical user interfaces for you toĀ experiment with machine learning and data mining models on your data.
Hoang Pham Truc Phuong, [email protected],Ā is the author of this article and he contributes to RobustTechHouse Blog for our Machine Learning column.Ā RobustTechHouse is a web & mobile app development house focusing on Financial (Fintech) and ECommerce sectors and likes to dabble with data analysis and machine learning too.
[Updated on 25 May 2015] Also see our follow up post onĀ Intro Primer To WEKA Explorer For Machine Learning
Weka supports several standard data mining tasks with many standard data miningĀ algorithms ranging from normal ones to really complex ones. All of Wekaās techniques are predicated on the assumption that the data is availableĀ as a single flat file or relation, where each data point is described by a fixed number of attributes.Ā Here are some main features of Weka:
Weka supports various file formats e.g, CSV, Matlab etc and its own file format (ARFF). It also supports most common databaseĀ management systems (DBMS) including HSQL, SQL SERVER, MySQL, PostgreSQL etc throughĀ java connections. For data processing, Weka has over 75 methods for filtering,Ā ranging from basic to advanced operators eg principal componentĀ analysis.
Weka has a lot of classification methods. Classifiers can be divided into āBayesianā methods (Naive Bayes, Bayesian nets etc.), lazy methods (nearest neighbor and variants), rule-based methods (decision tables, OneR, RIPPER), tree learnersĀ (C4.5, Naive Bayes trees, M5, J.48 etc), function-based learners (linear regression, SVMs,Ā Multilayer Perceptron, Gaussian processes) and miscellaneous methods.
Weka has most classic algorithms for clustering such as: Simple KMeans, Hierarchical class clustering, simple expectation maximization (EM).
The set of attributes used is essential for classification performance.Ā Various selection criteria and search methods are available.
Data can be inspected visually by plotting attribute values against theĀ class, or against other attribute values. Classifier output can be compared to training dataĀ in order to detect outliers and observe classifier characteristics and decision boundaries.Ā For specific methods, there are specialized tools for visualization, such as a tree viewer forĀ any method that produces classification trees, a Bayes network viewer with automaticĀ layout, and a dendrogram viewer for hierarchical clustering
This is a new function in Weka from version 3.7.x (version for Developers). Weka supports many methods for predicting time series as function-based learningĀ (Gaussian processing, linear regression, Multilayer perceptron neural network, SMOreg-support vector machine for regression), lazy method (K-nearest neighbours, Locally weightedĀ learning and KStar) and trees (Random forest, random tree)
From my experience, here are some reasons which make Weka a good toolbox for MachineĀ Learning:
1. Easy to use graphical user interfaces.
2. Contains most of the powerful algorithms published for machine learning.
3. Free availability under the GNU General Public License.
4. Portability, since it is fully implemented in the Java programming language and runsĀ on almost any modern computing platform.
5. A comprehensive collection of data pre-processing and modelling techniques.
Download from Weka Download Link. There are two versions of Weka: Stable version (3.6.12) and developer version (3.7.12). I personally prefer the developerĀ version because it allows me to install more packages, e.g, time series forecasting.
After downloading, unzip the zip file and run this command:
> java -Xmx1000M -jar weka.jar
Weka version 3.6 (Stable Version)
Weka Version 3.7 (Developer Version)
The above shows the subtle differences between the standard and developer versions.
To connect to a DBMS, you should to do the following steps:
1. Download java connection compatible with your DBMS,e.g, mysql-connector-java,Ā sql-connector-java
2. Use this syntax to run weka with DBMS:
> java -Xmx1000M -cp weka path:java_connection_pathĀ weka.gui.GUIChooser.
Here is the example I used to connect to mysql:
> java -Xmx1000M -cp /home/phuong/weka-3-7-12/weka.jar:/home/phuong/java_conn/mysql-connector-java-5.1.34-bin.jar weka.gui.GUIChooser
In, Weka explorer,Ā you can visualize, clean your data and try some algorithms forĀ clustering, classification and forecasting. Some features areĀ different between the stable version & developer version of Weka. Here, I am using āWeka Explorerā in theĀ developer version.
The explorer interface is divided into 11 different tabs in two tab lines (top line contain 5 featuresĀ and the other have 6 features) . The top line is only have in the developer version.
RConsole: It is an extension which combines Weka with R language and reuses some a lot of the awesome functions from R.
Parallel Coordinates Plot: a common way of visualizing high-dimensional geometryĀ and analyzing multivariate data.
Parallel Coordinates Plot
Projection Plot: To apply algorithms such as clustering algorithms and visualize theĀ results on the graph directly.
Visualize 3D: Plot your data in 3D space!
Forecasting: This function is used for time series forecasting. You will find someĀ famous algorithms such as SVM, regression in here.
Forecast: Output & Evaluation
Preprocess: Load a dataset and manipulate the data into a form that you want to workĀ with.
Classify: Select and run classication and regression algorithms to operate on your data.
Cluster: Select and run clustering algorithms on your dataset.
Associate: Run association algorithms to extract insights from your data.
Select Attributes: Run attribute selection algorithms on your data to select thoseĀ attributes that are relevant to the feature you want to predict.
Visualize: Visualize the relationship between attributes.
Unlike Weka Explorer that is used for analysis and experimenting with algorithms, āWeka Experimenterā is forĀ designing experiments with your selection of algorithms and datasets, running experiments andĀ analyzing the results. For example, the user can create an experiment that runs several schemesĀ against a series of datasets and then analyse the results to determine if one of the schemes isĀ statistically better than the other schemes.
Knowledge Flow helps you create a process to apply machine learning. It helps you graphicallyĀ design your process and run the design that you created. The analysis process goesĀ like this: loading and transforming of input data, followed by running of algorithms and then presentationĀ of results.
Weka Knowledge Flow Environment
You can review some links below for more information about Weka.
An Introduction To The WEKA Data Mining System
More Data Mining With Weka
Here we provided anĀ Intro Primer For WEKA Machine Learning Software. Hope you found it useful.
If you like our articles, please follow and like our Facebook page where we regularly share interesting postsĀ Ā andĀ check out our other blog articles.
RobustTechHouse is a leading tech company focusing on mobile app development, ECommerce, Mobile-Commerce and Financial Technology (FinTech) in Singapore.Ā If you are interested to engage RobustTechHouse onĀ your projects, you can contact us here.
Intro Primer For WEKA Machine Learning Software was originally published on RobustTechHouse - Mobile App Development Singapore