guide-to-machine-learning-algorithms

Quick Intro to Machine Learning for Beginners

Machine Learning technology and algorithms, its use in various industries, and what are the successful business examples of machine learning implementation.

Machine learning (ML), Artificial Intelligence (AI), and Data Science have been the top trends for the last few years. However, what does it all mean?

In this article, we’ll focus on ML technology, its algorithms, how it can help various industries, and what are the successful business examples of machine learning implementation.

If you would like to read more about Pattern Recognition, which is a part of Machine Learning technology, we have a separate detailed article about it with graphs, schemes, and useful information in plain English (or, as plain English as possible.)

In this issue:

  • Definitions
  • How does machine learning work
  • Real business cases
    • Banking
    • Government
    • Healthcare
    • Retail & E-commerce
    • Oil, Gas, and Energy Sector
    • Transportation & Logistics
    • AdTech

What Is Machine Learning Algorithms: Definition of Machine Learning

Before we define machine learning (ML) itself, let’s take a look at the general scheme of things. Machine Learning (ML) is a part of Artificial Intelligence (AI), which is a part of Computer and Data Science. Check out this helpful graph below:

What is the difference between AI and machine learning
Machine Learning, AI, and Computer Science Dependencies [Source]

What’s an algorithm? An algorithm is a sequence of instructions one must perform to solve a problem. For example, the flowchart below is a simple example of a straightforward algorithm. 

What coffee should I get? Flowchart
Introduction to Flow Charts [Source: www.phdcomics.com]

Machine Learning is a system of automated data processing algorithms that help to make decision-making more natural and enhance performance based on the results. The “learning” implies that the algorithm can glean new information and insights without being explicitly programmed. There are several models of machine learning:

  • Supervised ML (the outputs are labeled, and the inputs are mapped to corresponding outputs)
  • Unsupervised ML (the inputs are unlabeled, and the algorithms have to find patterns)
  • Reinforcement ML (similar to supervised ML, but in this case, instead of a labeled output, there are rewards and the algorithm’s goal is to maximize rewards)

Artificial intelligence (AI) is a part of Computer Science that focuses on systems that can solve problems and perform tasks that require human (or human-like) intelligence. What does machine learning do in this process? It learns, just like a human(like) brain, acquiring new information. 

Despite being a top trend in the last decade, machine learning as a term dates back to 1952, when an IBM expert, Arthur Samuel, wrote a program for playing checkers. For a long time, it was just a mathematical thing. However, when ML started to become more hands-on thanks to advances in computers, numerous industries started looking for ways to apply this empowering technology for their business purposes. 

How Do Machine Learning Algorithms Work: Machine Learning Principles

Machine learning algorithms are like an infinite loop. The end goal depends on the type of ML algorithms, but technically, the data can be continuously improved by going through the cycles, such as these:

  • Data (most of the time unlabeled) comes from various sources into one storage. 
  • The task of ML algorithms is to sort that data through
    • Label it according to the settings
    • Look for patterns in this data.
  • Sorted data is translated into usable insights
  • Insights are used to enhance and optimize business processes
  • Business and learning processes are automated. Once you’ve got one batch of insights and adjusted your processes, you can’t simply stop. You need to continue working on making everything act like clockwork.

It all sounds quite straightforward (sometimes it even is straightforward, when you’re dealing with small amounts of incoming information). What is required for a good machine learning system?

  1. Capabilities for data preparation. You need to make sure you’ve got the hardware and software necessary to collect, process, analyze, integrate, and store this data.
  2. ML algorithms. There are numerous types and kinds of machine learning algorithms that we’ll take a look at a bit later in this article. 
  3. Automation and iterative processes. So you don’t have to deal with the same thing over and over.
  4. Scalability. Even though right now, your data amount might be small, it’s better to prepare your system for future potential growth, because adjusting your project’s architecture once the product is alive is not an easy thing to do.
  5. Ensemble modeling. The idea here is that you use two or more algorithms at the same time to sync their results and get more precise results. 

There are a lot of specific terms when it comes to machine learning. Let’s stop for a moment to take a look at the terms and understand what they mean.

  • Epoch in machine learning. The epoch is complete when each sample in the dataset has had the chance to update the internal model parameters. Each epoch consists of one or more batches. The number of epochs is a parameter that states how many times the algorithm has to go through the entire training dataset.
  • Learning rate in machine learning. The learning rate is a parameter that controls how the model is changed due to estimated errors each time the model weights are updated. Usually, the learning rate is a small positive value, somewhere between 1.0 and 0.0.
  • Embedding in machine learning. Embeddings are the ways to transfer non-vector data into vector space, where ML algorithms can process various data pieces (for example, images, texts, graphs, etc.) TensorFlow’s Embedding Projector illustrates this idea well.
  • Regression in machine learning. ML borrowed this term from statistics. Regression means the ability to predict the values of the desired quantity when the target quantity is continuous. For example, you’re trying to create a simple calculator for the price of a land piece based on its area. As you increase the area, the price increases as well, and the graph of the increase is called regression. There are different types of regression: Simple Linear Regression, Polynomial Regression, Support Vector Regression, Decision Tree Regression, and Random Forest Regression.
  • Normalization in machine learning. The concept behind this technique is to change the values of numeric columns in the dataset to bring it to the common scale without changing the differences in the value ranges. Not every dataset has to go through normalization; it’s required when you have features with different ranges only.
  • Online learning in machine learning. Online ML is a method in which data becomes available in sequential order, and the best predictor for future data is updated at each step, instead of learning on the entire training dataset at once to get the best predictor. This method is used, for example, in stock price predictions, when it’s not feasible to get an entire dataset for training at once. 

Does every business need machine learning? As much as we’re tempted to say “YES!” because it would mean more work and clients for us, there are times when you don’t need ML to get insights. For example, if your business is small and incoming data can be easily analyzed and manipulated using simpler tools, like Google Spreadsheets, for example. 

However, ML is a lifesaver when it comes to businesses that deal with big data. Together with cloud computing power, machine learning algorithms enable fast and thorough processing and integration of data, whether it’s user behavior on your e-commerce website, DNA analysis for MedTech projects, or your ads’ effectiveness in an AdTech service. 

What Algorithm Does Machine Learning Use

We’ve already mentioned how many machine learning algorithms are there, but let’s stop for a bit to talk about them in more detail.

There are four major ML models: 

  • Supervised Machine Learning Algorithms
    • Linear Regression
    • Logistic Regression
    • Random Forest
    • Gradient Boosted Trees
    • Support Vector Machines (SVM)
    • Neural Networks
    • Decision Trees
    • Naive Bayes
    • Nearest Neighbor
  • Semi-supervised Machine Learning Algorithms
  • Unsupervised Machine Learning Algorithms
    • k-means clustering
    • t-SNE (t-Distributed Stochastic Neighbor Embedding)
    • PCA (Principal Component Analysis)
    • Association rule
  • Reinforcement Machine Learning Algorithms
    • Q-Learning
    • Temporal Difference (TD)
    • Monte-Carlo Tree Search (MCTS)
    • Asynchronous Actor-Critic Agents (A3C)

When you are facing the choice of “which machine learning algorithm to use,” you need to consider several factors, including: 

  • What is the size, quality, and nature of your information?
  • What’s your available computational time or, simply put, do you have a deadline?
  • How urgent is the task?
  • What do you want to do with the data itself and the results?

Check out this handy cheat sheet (by the way, this is also an example of an algorithm): 

machine-learning-cheat-sheet
[Source]

Machine Learning Algorithms Pros and Cons

The primary advantage of ML algorithms over traditional statistics is their ability to consume enormous numbers of records quickly and make predictions based on these incoming data that is more accurate. The result of these predictions is business intelligence insights that are valuable in the decision-making process. 

Machine learning algorithms also can automate and improve numerous simple jobs, which helps to streamline business processes and optimize expenses (which is always a good thing in business.) 

There are numerous business applications for ML technology, and we’ll talk about them later in the article. However, like any good thing, there are drawbacks and challenges to machine learning. 

When anyone is making a decision, there is always some sort of bias involved. For example, if a person is trying to decide where to have dinner and with whom, numerous factors are coming into play: where did that person eat before, the time of the day, the mood, and the desire to talk (or not talk) during dinner. The answers most of the time will include a bit of bias based on previous experience. 

With machine learning algorithms, bias can also cause a problem. On the one hand, large amounts of information get rid of some bias because it becomes negligible and gets lost in the other information (for example, one bad review of a restaurant gets lost in a thousand good ones.) On the other hand, if your incoming training dataset is biased, all your results will be biased as well, and this is when machine learning goes wrong (if, for example, your dataset contains mostly negative reviews.) 

There are other challenges that you need to be aware of when dealing with data and training your ML algorithms. They aren’t a problem in themselves, but something you need to think about before you decide to implement machine learning algorithms into business processes. 

Machine Learning and Big Data Challenges

Data Collection & Usage

If you simply collect information, nothing will happen (except for the fact that your storage would be full.) To transform incoming data into value-added business insights, you need to understand what kind of data you need and how you plan to use it.

Therefore, before you start to train your machine learning algorithm on a dataset, write down the goals for the algorithm and what kind of information can be helpful to reach that goal. For example, if you’re trying to decide which restaurant to pick, you don’t need a weather forecast for the day (unless you want to sit on the terrace) or the name of the sous chef (unless you’re tracking down a Michelin one).

If you end up getting that information as well, it won’t alter the results, but the process of wading through the numbers would take more effort and time. When we’re talking about big data projects, this issue becomes even more important.

Security

Cybersecurity is one of the hottest topics these days. With so many people sharing their personal information and millions of bots generating even more online data, it’s relatively easy to sway public opinion toward one or another decision. Think of Brexit and Cambridge Analytica scandal (the Brexit movie shows it quite well). Or Equifax leaks – it’s all about the illicit use of collected data. 

Therefore, when you are collecting information, you need to make sure you’re not violating the law. This is especially sensitive due to the General Data Protection Regulation law that came out in May of 2018. So far, GDPR works only within the bounds of the European Union. Nevertheless, you need to remember that if, for example, you’re located in the United States yet you have people from the EU come to your website or use your application, you need to make sure you get consent from them for data collection. 

When cybersecurity meets machine learning, you need to make sure the data you collect is regulated (clear permissions for different levels of users), informed (official consent agreement), monitored, and transparent. The data collected in this way can be used safely for training your ML algorithms. 

Data Validation

When you have incoming data from several sources, you need to pay attention to the credibility of your data. Is there any information that can cause harm? Semi-supervised machine learning algorithms help with automating the labeling process. They can go through unlabeled data and compare it to the available examples of already-labeled information.  

Right Algorithms

Every machine learning model has a purpose and is designed to perform specific tasks. Therefore, one of the challenges in ML is choosing the right algorithm that would bring you the insights you are looking for. For example, an unsupervised algorithm won’t be able to label your data correctly because its primary purpose is to look for patterns, and vice versa. Before you implement ML, once again, you need to write down your expectations from this technology and what is the end goal.

Training Dataset

When you’re training your machine learning algorithm, you need a good and large training dataset, so that the algorithm can identify the major patterns, information, and insights. If your dataset is small, the results might be very biased. The sources for data can come from data gathering services (for example, your business’s Google Analytics account), sample datasets (faux data), or they can be purchased from a third party. Whatever you do, remember that your data needs to be credible and validated, it should be relevant for the algorithm’s purpose, and it should be cleaned. It also should be legally obtained data (for obvious reasons.)

In some cases, you might use machine learning algorithms for small data sets, but most of the time, it’s not cost-effective. Therefore, either make sure you’ve got enough data or don’t invest in ML algorithms at this point. 

Data Noise

According to IBM researchers, “Pareto’s Principle applies: 80% of a data scientist’s valuable time is spent simply finding, cleansing, and organizing data, leaving only 20% to perform analysis.”

Data noise is any data that isn’t relevant to the ML algorithm’s purpose. As we’ve mentioned above, in our example with the restaurant. If you are looking for a restaurant with a specific sous-chef or a dish, getting information about sous-chefs at restaurants or their menus would be necessary for your purpose. If, however, you’re looking for a good place to eat in the vicinity, the name of the sous-chef would be data noise, because it would mean absolutely nothing to you. Data noise can be incomplete information, inconsequential data, anomalous bits, and information that can’t be identified.

Why is this important? Because data noise can dilute your machine learning algorithm’s efficiency, the results you’ll get will not be as precise as they can be. For example, let’s consider the table below with the average imaginary salary calculation:

SALARY SOURCE #1SALARY SOURCE #2
Salary 11010
Salary 21515
Salary 34343 (Anomaly)
Salary 41212
AVERAGE:2012.3

As you can see, 43, in this case, was an anomaly, but if this data noise isn’t taken out of consideration, the results are not correct. As you can see, if one person is getting more money than the three others combined, then the average would be considerably skewed. If that anomalous salary is taken out of the calculations, then the average becomes much more realistic. 

Training ML Model

Once you’ve sorted out the purpose of the machine learning technology for your particular business and understood what kind of information you’d like to collect, it’s time actually to train your machine learning model. There is no need to reinvent the wheel these days, because there are numerous tools available for it, including TensorFlowValohaiPyTorchApache Spark, and many others. 

Machine Learning Industries

When To Use Machine Learning: Machine Learning Algorithms and Their Use Cases

  • Self-driving cars from Google – that’s machine learning. 
  • Online recommendations from Netflix – machine learning.
  • Knowing what people say about you online – natural language processing and deep learning ML algorithms for sentiment analysis

As we mentioned above, numerous businesses already reap the benefits of machine learning algorithms. 

BANKING & FINANCIAL SERVICES

Various financial services and banks deal with a lot of numerical data, and this is one of the best uses of machine learning algorithms for anomaly detection and fraud prevention. The insights, besides protecting from high-risk clients and signs of manipulations, also help to identify investment opportunities or use machine learning algorithms for trading.

GOVERNMENT

Public Safety and utility agencies also can benefit from the insights provided by data mining and machine learning. One of the implementation areas is energy efficiency, which helps to minimize the expenses and the payload.

Machine learning algorithms for face recognition help with surveillance and protection from identity theft. 

HEALTHCARE

Machine learning for healthcare predictions is a very fast-growing trend due to wearable devices and sensors. Thanks to them, the patient’s data can be provided for the machine learning algorithms in real-time, helping to save lives. 

Big data analytics, in combination with machine learning algorithms, can also help in analyzing trends or identifying red flags in terms of diagnosis and treatment. 

Machine learning algorithms for image processing and machine learning algorithms for image classification are the technologies behind the ability to identify abnormal formations in various human organs and help early cancer detection, among other causes.

HUSPI had a chance to provide IT consulting services to one such project called Homeopath. Using machine learning algorithms for pattern recognition, machine learning algorithms for prediction, and machine learning algorithms for regression, the system, once launched, would continuously update its records with newer findings, making the future patients’ treatments more precise.

RETAIL & ECOMMERCE

With the help of machine learning algorithms for recommendation systems, retail, and e-commerce businesses can enjoy higher customer acquisition. Retailers can also capture data, analyze it, and use it to provide a personalized shopping experience, implement marketing campaigns, optimize prices, manage supply planning, and get customer insights.

OIL, GAS, AND ENERGY SECTOR

Machine learning algorithms for classification help to analyze the minerals in the ground, find new energy sources and streamline oil and gas distribution to make it cost-effective. Unsupervised machine learning algorithms for pattern detection are also used to diagnose sensor failures or manufacturing defects that would otherwise go undetected. 

TRANSPORTATION & AUTOMOTIVE

What is the most efficient route? How to increase traffic capacity? How do they optimize the traffic lights system in the city? All of these questions can be answered with the help of insights from machine learning algorithms. Have you ever used Waze and enjoyed its ability to adapt the route on the fly to the best possible one given the set of filters? Yep, this is machine learning at work.

Data analysis and modeling are also helpful tools for companies such as delivery providers, public transportation, and other organizations that deal with transport. 

Driverless cars are also powered by machine learning algorithms, especially their automatic energy response systems. Besides real-life examples, it is also used in the Gaming industry. For example, Grand Theft Auto uses a collision detection machine learning algorithm for moving people and cars. 

ADVERTISING TECHNOLOGIES (ADTECH)

AdTech businesses heavily lean on machine learning algorithms. One of the reasons for it is the fact that AdTech is one of the true Big Data industries. The incoming ads, the outgoing ads, the prices per ad, all payments to third parties and fees, etc. – all of this has to be accounted for. 

Similar to ML in e-commerce, machine learning algorithms for advertising help to make sure

  • there is no ad fraud involved
  • customers are targeted with ads they would like (based on recommendation engines and user preferences)
  • publishers got their money
  • and everyone is happy.

Machine learning algorithms are also key to real-time bidding systems and retargeting systems. 

Should You Use Machine Learning for Business?

Machine learning algorithms for data science and analysis are here to stay. Does every single business need it? Probably not, since numerous businesses are small and don’t generate a lot of data.

At the same time, it is important to understand that with the technological progress that we are witnessing today, the amount of information will continue to increase at an exponential level, and it might be wise to prepare your product’s architecture for the possibility of ML.

Need IT consultation in terms of machine learning algorithms or how to prepare your infrastructure for the incoming data? Contact us – we can help.

P.S. HUSPI has been named #1 among the TOP Artificial Intelligence companies in Poland.