Machine Learning is the sub-field of Artificial Intelligence that concerned with algorithm which allow computers to learn. These days, it is widely used in the area of Natural Language Processing, Biotechnology, Financial fraud detection, Product marketing, Stock market analysis and many more.
Defining machine Learning?
In simple term, we can say that making computers(machines) to learn things and come up with their own logic.
According to below definition given by Tom Mitchellin in 1997, Machine Learning can be easily understood:
A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. — Tom Mitchell,
Some of the Applications of machine Learning are:
- Natural Language Processing
- Online Advertising
- Recommendation System
- Search Engines
- Stock Market Analysis
- Customer/User behavior Analysis
- Fraud Detection
- Medical diagnosis
- Speech and Text Recognition
- Recognizing and Finding the Patterns etc.
Steps used in Machine Learning:
- Data Collection: First step is to identify the data source that could be stored in spreadsheets, databases, binary files, or big data systems and gather the data from these sources.
- Data Preparation:This step will clean the data, and check for Data quality. Data quality will check for missing values, outliers, accuracy, relevancy etc of data.
- Training the Model : In this step we will choose an appropriate Algorithm and represent data in form of the Model. We will split clean data in two parts:- train and test. The training data will be used for developing the Model and the second part(test) will be used as a reference.
- Testing the Model:We will use second part(test) of the data to test the accuracy of the Model. A better way to test the accuracy of the Model to see its performance on those data that has not been used in Model building.
- Improving the performance:This step will involve choosing a different variable altogether and introducing more variables to improve the efficiency.
Types of Machine Learning Algorithms
- Supervised Learning: In Supervised Learning each example is having an Input object(Vector) and desired Output value(Supervisory signal). Using this values, we generate a function that maps input to desired Output. The training will continue until the model achieves the desired level of accuracy on training data. Some of the examples of Supervised Learning are Random Forests, Bayesian Statistics, Decision Tree Learning, Regression etc.
- Unsupervised Learning: Here is unsupervised learning, we do not have any output value that will be used to train and check the model. The example given to the model are unlabeled(without output). Hence this algorithm used to draw assumptions from input data sets without any label. Most common method for unsupervised learning is cluster analysis(finding hidden patterns, grouping or segmenting the data etc).
- Reinforcement Learning: This type of Machine Learning algorithm automatically allows machine or software agent to determine the ideal behavior within a specific context, in order to improve the performance and that behavior is based on the rewards(feedback from the environment).
Some famous algorithm of Machine Learning
- K Means Clustering Algorithm: K-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.
- Support Vector Machine Algorithm: SVM (also support vector networks) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis.
- Naive Bayes Classifier Algorithm: Naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes’ theorem with strong (naive) independence assumptions between the features.
- Apriori Algorithm: Apriori is an algorithm for frequent item set mining and association rule learning over transnational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database.
- Logistic Regression: It is the concept of statistics,a regression model where Logistic Regression is used to predict categorical variables. Binary(yes/no) where the variable has one of two possible categories, or multinomial where there can be more than two categories.
- Linear Regression: It is an approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variables) denoted X.
- Artificial Neural Networks: It is a computational model based on the structure and function of biological neural networks. Information that flows through the network affects the structure of the ANN because a neural network changes or learns, in a sense – based on that input and output.
- Nearest Neighbors: It is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space.
- Decision Trees: A decision tree is a flow-chart-like structure, where each internal (non-leaf) node denotes a test on an attribute, each branch represents the outcome of a test, and each leaf (or terminal) node holds a class label. The topmost node in a tree is the root node.
- Random Forests: It is an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.