In this section, we will earn about the role of predictions in agile data science. The interactive reports uncover various aspects of data. Predictions form the fourth layer of agile sprint.
When making predictions, we generally refer to the past data and use them as inferences for future iterations. In this complete process, we change data from batch processing of historical data to real-time data about the future.
The role of predictions incorporates the following −
Predictions help in forecasting. Some forecasts are based on statistical inference. Some of the predictions are based on opinions of pundits.
Statistical inference are engaged with expectations of all kinds.
Sometimes forecasts are accurate, while sometimes forecasts are inaccurate.
Predictive analytics incorporates a variety of statistical techniques from predictive modeling, machine learning and data mining which analyze current and historical facts to make predictions about future and unknown events.
Predictive analytics requires training data. Trained data incorporates independent and dependent highlights. Dependent highlights are the values a user is trying to predict. Independent highlights are features describing the things we want to predict based on dependent highlights.
The study of features is called feature engineering; this is urgent to making predictions. Data visualization and exploratory data analysis are parts of feature engineering; these form the core of Agile data science.
There are two ways of making predictions in agile data science −
Regression
Classification
Building a regression or a classification totally depends on business requirements and its analysis. Forecast of continuous variable leads to regression model and prediction of categorical variables leads to classification model.
Regression considers examples that comprise features and thereby, produces a numeric output.
Classification takes the input and produces a categorical characterization.
Note − The example dataset that defines input to statistical prediction and that enables the machine to learn is designated “training data”.