Research-Methodology

Regression Analysis

Regression analysis is a quantitative research method which is used when the study involves modelling and analysing several variables, where the relationship includes a dependent variable and one or more independent variables. In simple terms, regression analysis is a quantitative method used to test the nature of relationships between a dependent variable and one or more independent variables.

The basic form of regression models includes unknown parameters (β), independent variables (X), and the dependent variable (Y).

Regression model, basically, specifies the relation of dependent variable (Y) to a function combination of independent variables (X) and unknown parameters (β)

                                    Y  ≈  f (X, β)   

Regression equation can be used to predict the values of ‘y’, if the value of ‘x’ is given, and both ‘y’ and ‘x’ are the two sets of measures of a sample size of ‘n’. The formulae for regression equation would be

Regression analysis

Do not be intimidated by visual complexity of correlation and regression formulae above. You don’t have to apply the formula manually, and correlation and regression analyses can be run with the application of popular analytical software such as Microsoft Excel, Microsoft Access, SPSS and others.

Linear regression analysis is based on the following set of assumptions:

1. Assumption of linearity . There is a linear relationship between dependent and independent variables.

2. Assumption of homoscedasticity . Data values for dependent and independent variables have equal variances.

3. Assumption of absence of collinearity or multicollinearity . There is no correlation between two or more independent variables.

4. Assumption of normal distribution . The data for the independent variables and dependent variable are normally distributed

My e-book,  The Ultimate Guide to Writing a Dissertation in Business Studies: a step by step assistance  offers practical assistance to complete a dissertation with minimum or no stress. The e-book covers all stages of writing a dissertation starting from the selection to the research area to submitting the completed version of the work within the deadline. John Dudovskiy

Regression analysis

  • Search Search Please fill out this field.

What Is Regression?

Understanding regression, calculating regression, the bottom line.

  • Macroeconomics

Regression: Definition, Analysis, Calculation, and Example

definition of regression analysis in research

Regression is a statistical method used in finance, investing, and other disciplines that attempts to determine the strength and character of the relationship between a dependent variable and one or more independent variables.

Linear regression is the most common form of this technique. Also called simple regression or ordinary least squares (OLS), linear regression establishes the linear relationship between two variables.

Linear regression is graphically depicted using a straight line of best fit with the slope defining how the change in one variable impacts a change in the other. The y-intercept of a linear regression relationship represents the value of the dependent variable when the value of the independent variable is zero. Nonlinear regression models also exist, but are far more complex.

Key Takeaways

  • Regression is a statistical technique that relates a dependent variable to one or more independent variables.
  • A regression model is able to show whether changes observed in the dependent variable are associated with changes in one or more of the independent variables.
  • It does this by essentially determining a best-fit line and seeing how the data is dispersed around this line.
  • Regression helps economists and financial analysts in things ranging from asset valuation to making predictions.
  • For regression results to be properly interpreted, several assumptions about the data and the model itself must hold.

In economics, regression is used to help investment managers value assets and understand the relationships between factors such as commodity prices and the stocks of businesses dealing in those commodities.

While a powerful tool for uncovering the associations between variables observed in data, it cannot easily indicate causation. Regression as a statistical technique should not be confused with the concept of regression to the mean, also known as mean reversion .

Joules Garcia / Investopedia

Regression captures the correlation between variables observed in a data set and quantifies whether those correlations are statistically significant or not.

The two basic types of regression are simple linear regression and  multiple linear regression , although there are nonlinear regression methods for more complicated data and analysis. Simple linear regression uses one independent variable to explain or predict the outcome of the dependent variable Y, while multiple linear regression uses two or more independent variables to predict the outcome. Analysts can use stepwise regression to examine each independent variable contained in the linear regression model.

Regression can help finance and investment professionals. For instance, a company might use it to predict sales based on weather, previous sales, gross domestic product (GDP) growth, or other types of conditions. The capital asset pricing model (CAPM) is an often-used regression model in finance for pricing assets and discovering the costs of capital.

Regression and Econometrics

Econometrics is a set of statistical techniques used to analyze data in finance and economics. An example of the application of econometrics is to study the income effect using observable data. An economist may, for example, hypothesize that as a person increases their income , their spending will also increase.

If the data show that such an association is present, a regression analysis can then be conducted to understand the strength of the relationship between income and consumption and whether or not that relationship is statistically significant.

Note that you can have several independent variables in an analysis—for example, changes to GDP and inflation in addition to unemployment in explaining stock market prices. When more than one independent variable is used, it is referred to as  multiple linear regression . This is the most commonly used tool in econometrics.

Econometrics is sometimes criticized for relying too heavily on the interpretation of regression output without linking it to economic theory or looking for causal mechanisms. It is crucial that the findings revealed in the data are able to be adequately explained by a theory.

Linear regression models often use a least-squares approach to determine the line of best fit. The least-squares technique is determined by minimizing the sum of squares created by a mathematical function. A square is, in turn, determined by squaring the distance between a data point and the regression line or mean value of the data set.

Once this process has been completed (usually done today with software), a regression model is constructed. The general form of each type of regression model is:

Simple linear regression:

Y = a + b X + u \begin{aligned}&Y = a + bX + u \\\end{aligned} ​ Y = a + b X + u ​

Multiple linear regression:

Y = a + b 1 X 1 + b 2 X 2 + b 3 X 3 + . . . + b t X t + u where: Y = The dependent variable you are trying to predict or explain X = The explanatory (independent) variable(s) you are  using to predict or associate with Y a = The y-intercept b = (beta coefficient) is the slope of the explanatory variable(s) u = The regression residual or error term \begin{aligned}&Y = a + b_1X_1 + b_2X_2 + b_3X_3 + ... + b_tX_t + u \\&\textbf{where:} \\&Y = \text{The dependent variable you are trying to predict} \\&\text{or explain} \\&X = \text{The explanatory (independent) variable(s) you are } \\&\text{using to predict or associate with Y} \\&a = \text{The y-intercept} \\&b = \text{(beta coefficient) is the slope of the explanatory} \\&\text{variable(s)} \\&u = \text{The regression residual or error term} \\\end{aligned} ​ Y = a + b 1 ​ X 1 ​ + b 2 ​ X 2 ​ + b 3 ​ X 3 ​ + ... + b t ​ X t ​ + u where: Y = The dependent variable you are trying to predict or explain X = The explanatory (independent) variable(s) you are  using to predict or associate with Y a = The y-intercept b = (beta coefficient) is the slope of the explanatory variable(s) u = The regression residual or error term ​

Example of How Regression Analysis Is Used in Finance

Regression is often used to determine how specific factors—such as the price of a commodity, interest rates, particular industries, or sectors—influence the price movement of an asset. The aforementioned CAPM is based on regression, and it's utilized to project the expected returns for stocks and to generate costs of capital. A stock’s returns are regressed against the returns of a broader index, such as the S&P 500, to generate a beta for the particular stock.

Beta is the stock’s risk in relation to the market or index and is reflected as the slope in the CAPM. The return for the stock in question would be the dependent variable Y, while the independent variable X would be the market risk premium.

Additional variables such as the market capitalization of a stock, valuation ratios, and recent returns can be added to the CAPM to get better estimates for returns. These additional factors are known as the Fama-French factors, named after the professors who developed the multiple linear regression model to better explain asset returns.

Why Is It Called Regression?

Although there is some debate about the origins of the name, the statistical technique described above most likely was termed “regression” by Sir Francis Galton in the 19th century to describe the statistical feature of biological data (such as heights of people in a population) to regress to some mean level. In other words, while there are shorter and taller people, only outliers are very tall or short, and most people cluster somewhere around (or “regress” to) the average.

What Is the Purpose of Regression?

In statistical analysis, regression is used to identify the associations between variables occurring in some data. It can show the magnitude of such an association and determine its statistical significance. Regression is a powerful tool for statistical inference and has been used to try to predict future outcomes based on past observations.

How Do You Interpret a Regression Model?

A regression model output may be in the form of Y = 1.0 + (3.2) X 1 - 2.0( X 2 ) + 0.21.

Here we have a multiple linear regression that relates some variable Y with two explanatory variables X 1 and X 2 . We would interpret the model as the value of Y changes by 3.2× for every one-unit change in X 1 (if X 1 goes up by 2, Y goes up by 6.4, etc.) holding all else constant. That means controlling for X 2 , X 1 has this observed relationship. Likewise, holding X1 constant, every one unit increase in X 2 is associated with a 2× decrease in Y. We can also note the y-intercept of 1.0, meaning that Y = 1 when X 1 and X 2 are both zero. The error term (residual) is 0.21.

What Are the Assumptions That Must Hold for Regression Models?

To properly interpret the output of a regression model, the following main assumptions about the underlying data process of what you are analyzing must hold:

  • The relationship between variables is linear;
  • There must be homoskedasticity , or the variance of the variables and error term must remain constant;
  • All explanatory variables are independent of one another;
  • All variables are normally distributed .

Regression is a statistical method that tries to determine the strength and character of the relationship between one dependent variable and a series of other variables. It is used in finance, investing, and other disciplines.

Regression analysis uncovers the associations between variables observed in data, but cannot easily indicate causation.

Margo Bergman. “ Quantitative Analysis for Business: 12. Simple Linear Regression and Correlation .” University of Washington Pressbooks, 2022.

Margo Bergman. “ Quantitative Analysis for Business: 13. Multiple Linear Regression .” University of Washington Pressbooks, 2022.

Fama, Eugene F., and Kenneth R. French, via Wiley Online Library. “ The Cross-Section of Expected Stock Returns .” The Journal of Finance , vol. 47, no. 2, June 1992, pp. 427–465.

Stanton, Jeffrey M., via Taylor & Francis Online. “ Galton, Pearson, and the Peas: A Brief History of Linear Regression for Statistics Instructors .” Journal of Statistics Education , vol. 9, no. 3, 2001.

CFA Institute. “ Basics of Multiple Regression and Underlying Assumptions .”

definition of regression analysis in research

  • Terms of Service
  • Editorial Policy
  • Privacy Policy

Understanding regression analysis: overview and key uses

Last updated

22 August 2024

Reviewed by

Miroslav Damyanov

Regression analysis is a fundamental statistical method that helps us predict and understand how different factors (aka independent variables) influence a specific outcome (aka dependent variable). 

Imagine you're trying to predict the value of a house. Regression analysis can help you create a formula to estimate the house's value by looking at variables like the home's size and the neighborhood's average income. This method is crucial because it allows us to predict and analyze trends based on data. 

While that example is straightforward, the technique can be applied to more complex situations, offering valuable insights into fields such as economics, healthcare, marketing, and more.

  • 3 uses for regression analysis in business

Businesses can use regression analysis to improve nearly every aspect of their operations. When used correctly, it's a powerful tool for learning how adjusting variables can improve outcomes. Here are three applications:

1. Prediction and forecasting

Predicting future scenarios can give businesses significant advantages. No method can guarantee absolute certainty, but regression analysis offers a reliable framework for forecasting future trends based on past data. Companies can apply this method to anticipate future sales for financial planning purposes and predict inventory requirements for more efficient space and cost management. Similarly, an insurance company can employ regression analysis to predict the likelihood of claims for more accurate underwriting. 

2. Identifying inefficiencies and opportunities

Regression analysis can help us understand how the relationships between different business processes affect outcomes. Its ability to model complex relationships means that regression analysis can accurately highlight variables that lead to inefficiencies, which intuition alone may not do. Regression analysis allows businesses to improve performance significantly through targeted interventions. For instance, a manufacturing plant experiencing production delays, machine downtime, or labor shortages can use regression analysis to determine the underlying causes of these issues.

3. Making data-driven decisions

Regression analysis can enhance decision-making for any situation that relies on dependent variables. For example, a company can analyze the impact of various price points on sales volume to find the best pricing strategy for its products. Understanding buying behavior factors can help segment customers into buyer personas for improved targeting and messaging.

  • Types of regression models

There are several types of regression models, each suited to a particular purpose. Picking the right one is vital to getting the correct results. 

Simple linear regression analysis is the simplest form of regression analysis. It examines the relationship between exactly one dependent variable and one independent variable, fitting a straight line to the data points on a graph.

Multiple regression analysis examines how two or more independent variables affect a single dependent variable. It extends simple linear regression and requires a more complex algorithm.

Multivariate linear regression is suitable for multiple dependent variables. It allows the analysis of how independent variables influence multiple outcomes.

Logistic regression is relevant when the dependent variable is categorical, such as binary outcomes (e.g., true/false or yes/no). Logistic regression estimates the probability of a category based on the independent variables.

  • 6 mistakes people make with regression analysis

Ignoring key variables is a common mistake when working with regression analysis. Here are a few more pitfalls to try and avoid:

1. Overfitting the model

If a model is too complex, it can become overly powerful and lead to a problem known as overfitting. This mistake is an especially significant problem when the independent variables don't impact the dependent data, though it can happen whenever the model over-adjusts to fit all the variables. In such cases, the model starts memorizing noise rather than meaningful data. When this happens, the model’s results will fit the training data perfectly but fail to generalize to new, unseen data, rendering the model ineffective for prediction or inference.  

2. Underfitting the model

A less complex model is unlikely to draw false conclusions mistakenly. However, if the model is too simplistic, it will face the opposite problem: underfitting. In this case, the model will fail to capture the underlying patterns in the data, meaning it won't perform well on either the training or new, unseen data. This lack of complexity prevents the model from making accurate predictions or drawing meaningful inferences. 

3. Neglecting model validation

Model validation is how you can be sure that a model isn't overfitting or underfitting. Imagine teaching a child to read. If you always read the same book to the child, they might memorize it and recite it perfectly, making it seem like they’ve learned to read. However, if you give them a new book, they might struggle and be unable to read it.

This scenario is similar to a model that performs well on its training data but fails with new data. Model validation involves testing the model with data it hasn’t seen before. If the model performs well on this new data, it indicates having truly learned to generalize. On the other hand, if the model only performs well on the training data and poorly on new data, it has overfitted to the training data, much like the child who can only recite the memorized book.

4. Multicollinearity

Regression analysis works best when the independent variables are genuinely independent. However, sometimes, two or more variables are highly correlated. This multicollinearity can make it hard for the model to accurately determine each variable's impact. 

If a model gives poor results, checking for correlated variables may reveal the issue. You can fix it by removing one or more correlated variables or using a principal component analysis (PCA) technique, which transforms the correlated variables into a set of uncorrelated components.

5. Misinterpreting coefficients

Errors are not always due to the model itself; human error is common. These mistakes often involve misinterpreting the results. For example, someone might misunderstand the units of measure and draw incorrect conclusions. Another frequent issue in scientific analysis is confusing correlation and causation. Regression analysis can only provide insights into correlation, not causation.

6. Poor data quality

The adage “garbage in, garbage out” strongly applies to regression analysis. When low-quality data is input into a model, it analyzes noise rather than meaningful patterns. Poor data quality can manifest as missing values, unrepresentative data, outliers, and measurement errors. Additionally, the model may have excluded essential variables significantly impacting the results. All these issues can distort the relationships between variables and lead to misleading results. 

  • What are the assumptions that must hold for regression models?

To correctly interpret the output of a regression model, the following key assumptions about the underlying data process must hold:

The relationship between variables is linear.

There must be homoscedasticity, meaning the variance of the variables and the error term must remain constant.

All explanatory variables are independent of one another.

All variables are normally distributed.

  • Real-life examples of regression analysis

Let's turn our attention to examining how a few industries use the regression analysis to improve their outcomes:

Regression analysis has many applications in healthcare, but two of the most common are improving patient outcomes and optimizing resources. 

Hospitals need to use resources effectively to ensure the best patient outcomes. Regression models can help forecast patient admissions, equipment and supply usage, and more. These models allow hospitals to plan and maximize their resources. 

Predicting stock prices, economic trends, and financial risks benefits the finance industry. Regression analysis can help finance professionals make informed decisions about these topics. 

For example, analysts often use regression analysis to assess how changes to GDP, interest rates, and unemployment rates impact stock prices. Armed with this information, they can make more informed portfolio decisions. 

The banking industry also uses regression analysis. When a loan underwriter determines whether to grant a loan, regression analysis allows them to calculate the probability that a potential lender will repay the loan.

Imagine how much more effective a company's marketing efforts could be if they could predict customer behavior. Regression analysis allows them to do so with a degree of accuracy. For example, marketers can analyze how price, advertising spend, and product features (combined) influence sales. Once they've identified key sales drivers, they can adjust their strategy to maximize revenue. They may approach this analysis in stages. 

For instance, if they determine that ad spend is the biggest driver, they can apply regression analysis to data specific to advertising efforts. Doing so allows them to improve the ROI of ads. The opposite may also be true. If ad spending has little to no impact on sales, something is wrong that regression analysis might help identify. 

  • Regression analysis tools and software

Regression analysis by hand isn't practical. The process requires large numbers and complex calculations. Computers make even the most complex regression analysis possible. Even the most complicated AI algorithms can be considered fancy regression calculations. Many tools exist to help users create these regressions.

Another programming language—while MATLAB is a commercial tool, the open-source project Octave aims to implement much of the functionality. These languages are for complex mathematical operations, including regression analysis. Its tools for computation and visualization have made it very popular in academia, engineering, and industry for calculating regression and displaying the results. MATLAB integrates with other toolboxes so developers can extend its functionality and allow for application-specific solutions.

Python is a more general programming language than the previous examples, but many libraries are available that extend its functionality. For regression analysis, packages like Scikit-Learn and StatsModels provide the computational tools necessary for the job. In contrast, packages like Pandas and Matplotlib can handle large amounts of data and display the results. Python is a simple-to-learn, easy-to-read programming language, which can give it a leg up over the more dedicated math and statistics languages. 

SAS (Statistical Analysis System) is a commercial software suite for advanced analytics, multivariate analysis, business intelligence, and data management. It includes a procedure called PROC REG that allows users to efficiently perform regression analysis on their data. The software is well-known for its data-handling capabilities, extensive documentation, and technical support. These factors make it a common choice for large-scale enterprise use and industries requiring rigorous statistical analysis. 

Stata is another statistical software package. It provides an integrated data analysis, management, and graphics environment. The tool includes tools for performing a range of regression analysis tasks. This tool's popularity is due to its ease of use, reproducibility, and ability to handle complex datasets intuitively. The extensive documentation helps beginners get started quickly. Stata is widely used in academic research, economics, sociology, and political science.

Most people know Excel , but you might not know that Microsoft's spreadsheet software has an add-in called Analysis ToolPak that can perform basic linear regression and visualize the results. Excel is not an excellent choice for more complex regression or very large datasets. But for those with basic needs who only want to analyze smaller datasets quickly, it's a convenient option already in many tech stacks. 

SPSS (Statistical Package for the Social Sciences) is a versatile statistical analysis software widely used in social science, business, and health. It offers tools for various analyses, including regression, making it accessible to users through its user-friendly interface. SPSS enables users to manage and visualize data, perform complex analyses, and generate reports without coding. Its extensive documentation and support make it popular in academia and industry, allowing for efficient handling of large datasets and reliable results.

What is a regression analysis in simple terms?

Regression analysis is a statistical method used to estimate and quantify the relationship between a dependent variable and one or more independent variables. It helps determine the strength and direction of these relationships, allowing predictions about the dependent variable based on the independent variables and providing insights into how each independent variable impacts the dependent variable.

What are the main types of variables used in regression analysis?

Dependent variables : typically continuous (e.g., house price) or binary (e.g., yes/no outcomes).

Independent variables : can be continuous, categorical, binary, or ordinal.

What does a regression analysis tell you?

Regression analysis identifies the relationships between a dependent variable and one or more independent variables. It quantifies the strength and direction of these relationships, allowing you to predict the dependent variable based on the independent variables and understand the impact of each independent variable on the dependent variable.

Should you be using a customer insights hub?

Do you want to discover previous research faster?

Do you share your research findings with others?

Do you analyze research data?

Start for free today, add your research, and get to key insights faster

Editor’s picks

Last updated: 18 April 2023

Last updated: 27 February 2023

Last updated: 22 August 2024

Last updated: 5 February 2023

Last updated: 16 August 2024

Last updated: 9 March 2023

Last updated: 30 April 2024

Last updated: 12 December 2023

Last updated: 11 March 2024

Last updated: 4 July 2024

Last updated: 6 March 2024

Last updated: 5 March 2024

Last updated: 13 May 2024

Latest articles

Related topics, .css-je19u9{-webkit-align-items:flex-end;-webkit-box-align:flex-end;-ms-flex-align:flex-end;align-items:flex-end;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;-webkit-box-flex-wrap:wrap;-webkit-flex-wrap:wrap;-ms-flex-wrap:wrap;flex-wrap:wrap;-webkit-box-pack:center;-ms-flex-pack:center;-webkit-justify-content:center;justify-content:center;row-gap:0;text-align:center;max-width:671px;}@media (max-width: 1079px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}}@media (max-width: 799px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}} decide what to .css-1kiodld{max-height:56px;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}@media (max-width: 1079px){.css-1kiodld{display:none;}} build next, decide what to build next, log in or sign up.

Get started for free

  • Privacy Policy

Research Method

Home » Regression Analysis – Methods, Types and Examples

Regression Analysis – Methods, Types and Examples

Table of Contents

Regression Analysis

Regression Analysis

Regression analysis is a set of statistical processes for estimating the relationships among variables . It includes many techniques for modeling and analyzing several variables when the focus is on the relationship between a dependent variable and one or more independent variables (or ‘predictors’).

Regression Analysis Methodology

Here is a general methodology for performing regression analysis:

  • Define the research question: Clearly state the research question or hypothesis you want to investigate. Identify the dependent variable (also called the response variable or outcome variable) and the independent variables (also called predictor variables or explanatory variables) that you believe are related to the dependent variable.
  • Collect data: Gather the data for the dependent variable and independent variables. Ensure that the data is relevant, accurate, and representative of the population or phenomenon you are studying.
  • Explore the data: Perform exploratory data analysis to understand the characteristics of the data, identify any missing values or outliers, and assess the relationships between variables through scatter plots, histograms, or summary statistics.
  • Choose the regression model: Select an appropriate regression model based on the nature of the variables and the research question. Common regression models include linear regression, multiple regression, logistic regression, polynomial regression, and time series regression, among others.
  • Assess assumptions: Check the assumptions of the regression model. Some common assumptions include linearity (the relationship between variables is linear), independence of errors, homoscedasticity (constant variance of errors), and normality of errors. Violation of these assumptions may require additional steps or alternative models.
  • Estimate the model: Use a suitable method to estimate the parameters of the regression model. The most common method is ordinary least squares (OLS), which minimizes the sum of squared differences between the observed and predicted values of the dependent variable.
  • I nterpret the results: Analyze the estimated coefficients, p-values, confidence intervals, and goodness-of-fit measures (e.g., R-squared) to interpret the results. Determine the significance and direction of the relationships between the independent variables and the dependent variable.
  • Evaluate model performance: Assess the overall performance of the regression model using appropriate measures, such as R-squared, adjusted R-squared, and root mean squared error (RMSE). These measures indicate how well the model fits the data and how much of the variation in the dependent variable is explained by the independent variables.
  • Test assumptions and diagnose problems: Check the residuals (the differences between observed and predicted values) for any patterns or deviations from assumptions. Conduct diagnostic tests, such as examining residual plots, testing for multicollinearity among independent variables, and assessing heteroscedasticity or autocorrelation, if applicable.
  • Make predictions and draw conclusions: Once you have a satisfactory model, use it to make predictions on new or unseen data. Draw conclusions based on the results of the analysis, considering the limitations and potential implications of the findings.

Types of Regression Analysis

Types of Regression Analysis are as follows:

Linear Regression

Linear regression is the most basic and widely used form of regression analysis. It models the linear relationship between a dependent variable and one or more independent variables. The goal is to find the best-fitting line that minimizes the sum of squared differences between observed and predicted values.

Multiple Regression

Multiple regression extends linear regression by incorporating two or more independent variables to predict the dependent variable. It allows for examining the simultaneous effects of multiple predictors on the outcome variable.

Polynomial Regression

Polynomial regression models non-linear relationships between variables by adding polynomial terms (e.g., squared or cubic terms) to the regression equation. It can capture curved or nonlinear patterns in the data.

Logistic Regression

Logistic regression is used when the dependent variable is binary or categorical. It models the probability of the occurrence of a certain event or outcome based on the independent variables. Logistic regression estimates the coefficients using the logistic function, which transforms the linear combination of predictors into a probability.

Ridge Regression and Lasso Regression

Ridge regression and Lasso regression are techniques used for addressing multicollinearity (high correlation between independent variables) and variable selection. Both methods introduce a penalty term to the regression equation to shrink or eliminate less important variables. Ridge regression uses L2 regularization, while Lasso regression uses L1 regularization.

Time Series Regression

Time series regression analyzes the relationship between a dependent variable and independent variables when the data is collected over time. It accounts for autocorrelation and trends in the data and is used in forecasting and studying temporal relationships.

Nonlinear Regression

Nonlinear regression models are used when the relationship between the dependent variable and independent variables is not linear. These models can take various functional forms and require estimation techniques different from those used in linear regression.

Poisson Regression

Poisson regression is employed when the dependent variable represents count data. It models the relationship between the independent variables and the expected count, assuming a Poisson distribution for the dependent variable.

Generalized Linear Models (GLM)

GLMs are a flexible class of regression models that extend the linear regression framework to handle different types of dependent variables, including binary, count, and continuous variables. GLMs incorporate various probability distributions and link functions.

Regression Analysis Formulas

Regression analysis involves estimating the parameters of a regression model to describe the relationship between the dependent variable (Y) and one or more independent variables (X). Here are the basic formulas for linear regression, multiple regression, and logistic regression:

Linear Regression:

Simple Linear Regression Model: Y = β0 + β1X + ε

Multiple Linear Regression Model: Y = β0 + β1X1 + β2X2 + … + βnXn + ε

In both formulas:

  • Y represents the dependent variable (response variable).
  • X represents the independent variable(s) (predictor variable(s)).
  • β0, β1, β2, …, βn are the regression coefficients or parameters that need to be estimated.
  • ε represents the error term or residual (the difference between the observed and predicted values).

Multiple Regression:

Multiple regression extends the concept of simple linear regression by including multiple independent variables.

Multiple Regression Model: Y = β0 + β1X1 + β2X2 + … + βnXn + ε

The formulas are similar to those in linear regression, with the addition of more independent variables.

Logistic Regression:

Logistic regression is used when the dependent variable is binary or categorical. The logistic regression model applies a logistic or sigmoid function to the linear combination of the independent variables.

Logistic Regression Model: p = 1 / (1 + e^-(β0 + β1X1 + β2X2 + … + βnXn))

In the formula:

  • p represents the probability of the event occurring (e.g., the probability of success or belonging to a certain category).
  • X1, X2, …, Xn represent the independent variables.
  • e is the base of the natural logarithm.

The logistic function ensures that the predicted probabilities lie between 0 and 1, allowing for binary classification.

Regression Analysis Examples

Regression Analysis Examples are as follows:

  • Stock Market Prediction: Regression analysis can be used to predict stock prices based on various factors such as historical prices, trading volume, news sentiment, and economic indicators. Traders and investors can use this analysis to make informed decisions about buying or selling stocks.
  • Demand Forecasting: In retail and e-commerce, real-time It can help forecast demand for products. By analyzing historical sales data along with real-time data such as website traffic, promotional activities, and market trends, businesses can adjust their inventory levels and production schedules to meet customer demand more effectively.
  • Energy Load Forecasting: Utility companies often use real-time regression analysis to forecast electricity demand. By analyzing historical energy consumption data, weather conditions, and other relevant factors, they can predict future energy loads. This information helps them optimize power generation and distribution, ensuring a stable and efficient energy supply.
  • Online Advertising Performance: It can be used to assess the performance of online advertising campaigns. By analyzing real-time data on ad impressions, click-through rates, conversion rates, and other metrics, advertisers can adjust their targeting, messaging, and ad placement strategies to maximize their return on investment.
  • Predictive Maintenance: Regression analysis can be applied to predict equipment failures or maintenance needs. By continuously monitoring sensor data from machines or vehicles, regression models can identify patterns or anomalies that indicate potential failures. This enables proactive maintenance, reducing downtime and optimizing maintenance schedules.
  • Financial Risk Assessment: Real-time regression analysis can help financial institutions assess the risk associated with lending or investment decisions. By analyzing real-time data on factors such as borrower financials, market conditions, and macroeconomic indicators, regression models can estimate the likelihood of default or assess the risk-return tradeoff for investment portfolios.

Importance of Regression Analysis

Importance of Regression Analysis is as follows:

  • Relationship Identification: Regression analysis helps in identifying and quantifying the relationship between a dependent variable and one or more independent variables. It allows us to determine how changes in independent variables impact the dependent variable. This information is crucial for decision-making, planning, and forecasting.
  • Prediction and Forecasting: Regression analysis enables us to make predictions and forecasts based on the relationships identified. By estimating the values of the dependent variable using known values of independent variables, regression models can provide valuable insights into future outcomes. This is particularly useful in business, economics, finance, and other fields where forecasting is vital for planning and strategy development.
  • Causality Assessment: While correlation does not imply causation, regression analysis provides a framework for assessing causality by considering the direction and strength of the relationship between variables. It allows researchers to control for other factors and assess the impact of a specific independent variable on the dependent variable. This helps in determining the causal effect and identifying significant factors that influence outcomes.
  • Model Building and Variable Selection: Regression analysis aids in model building by determining the most appropriate functional form of the relationship between variables. It helps researchers select relevant independent variables and eliminate irrelevant ones, reducing complexity and improving model accuracy. This process is crucial for creating robust and interpretable models.
  • Hypothesis Testing: Regression analysis provides a statistical framework for hypothesis testing. Researchers can test the significance of individual coefficients, assess the overall model fit, and determine if the relationship between variables is statistically significant. This allows for rigorous analysis and validation of research hypotheses.
  • Policy Evaluation and Decision-Making: Regression analysis plays a vital role in policy evaluation and decision-making processes. By analyzing historical data, researchers can evaluate the effectiveness of policy interventions and identify the key factors contributing to certain outcomes. This information helps policymakers make informed decisions, allocate resources effectively, and optimize policy implementation.
  • Risk Assessment and Control: Regression analysis can be used for risk assessment and control purposes. By analyzing historical data, organizations can identify risk factors and develop models that predict the likelihood of certain outcomes, such as defaults, accidents, or failures. This enables proactive risk management, allowing organizations to take preventive measures and mitigate potential risks.

When to Use Regression Analysis

  • Prediction : Regression analysis is often employed to predict the value of the dependent variable based on the values of independent variables. For example, you might use regression to predict sales based on advertising expenditure, or to predict a student’s academic performance based on variables like study time, attendance, and previous grades.
  • Relationship analysis: Regression can help determine the strength and direction of the relationship between variables. It can be used to examine whether there is a linear association between variables, identify which independent variables have a significant impact on the dependent variable, and quantify the magnitude of those effects.
  • Causal inference: Regression analysis can be used to explore cause-and-effect relationships by controlling for other variables. For example, in a medical study, you might use regression to determine the impact of a specific treatment while accounting for other factors like age, gender, and lifestyle.
  • Forecasting : Regression models can be utilized to forecast future trends or outcomes. By fitting a regression model to historical data, you can make predictions about future values of the dependent variable based on changes in the independent variables.
  • Model evaluation: Regression analysis can be used to evaluate the performance of a model or test the significance of variables. You can assess how well the model fits the data, determine if additional variables improve the model’s predictive power, or test the statistical significance of coefficients.
  • Data exploration : Regression analysis can help uncover patterns and insights in the data. By examining the relationships between variables, you can gain a deeper understanding of the data set and identify potential patterns, outliers, or influential observations.

Applications of Regression Analysis

Here are some common applications of regression analysis:

  • Economic Forecasting: Regression analysis is frequently employed in economics to forecast variables such as GDP growth, inflation rates, or stock market performance. By analyzing historical data and identifying the underlying relationships, economists can make predictions about future economic conditions.
  • Financial Analysis: Regression analysis plays a crucial role in financial analysis, such as predicting stock prices or evaluating the impact of financial factors on company performance. It helps analysts understand how variables like interest rates, company earnings, or market indices influence financial outcomes.
  • Marketing Research: Regression analysis helps marketers understand consumer behavior and make data-driven decisions. It can be used to predict sales based on advertising expenditures, pricing strategies, or demographic variables. Regression models provide insights into which marketing efforts are most effective and help optimize marketing campaigns.
  • Health Sciences: Regression analysis is extensively used in medical research and public health studies. It helps examine the relationship between risk factors and health outcomes, such as the impact of smoking on lung cancer or the relationship between diet and heart disease. Regression analysis also helps in predicting health outcomes based on various factors like age, genetic markers, or lifestyle choices.
  • Social Sciences: Regression analysis is widely used in social sciences like sociology, psychology, and education research. Researchers can investigate the impact of variables like income, education level, or social factors on various outcomes such as crime rates, academic performance, or job satisfaction.
  • Operations Research: Regression analysis is applied in operations research to optimize processes and improve efficiency. For example, it can be used to predict demand based on historical sales data, determine the factors influencing production output, or optimize supply chain logistics.
  • Environmental Studies: Regression analysis helps in understanding and predicting environmental phenomena. It can be used to analyze the impact of factors like temperature, pollution levels, or land use patterns on phenomena such as species diversity, water quality, or climate change.
  • Sports Analytics: Regression analysis is increasingly used in sports analytics to gain insights into player performance, team strategies, and game outcomes. It helps analyze the relationship between various factors like player statistics, coaching strategies, or environmental conditions and their impact on game outcomes.

Advantages and Disadvantages of Regression Analysis

Advantages of Regression AnalysisDisadvantages of Regression Analysis
Provides a quantitative measure of the relationship between variablesAssumes a linear relationship between variables, which may not always hold true
Helps in predicting and forecasting outcomes based on historical dataRequires a large sample size to produce reliable results
Identifies and measures the significance of independent variables on the dependent variableAssumes no multicollinearity, meaning that independent variables should not be highly correlated with each other
Provides estimates of the coefficients that represent the strength and direction of the relationship between variablesAssumes the absence of outliers or influential data points
Allows for hypothesis testing to determine the statistical significance of the relationshipCan be sensitive to the inclusion or exclusion of certain variables, leading to different results
Can handle both continuous and categorical variablesAssumes the independence of observations, which may not hold true in some cases
Offers a visual representation of the relationship through the use of scatter plots and regression linesMay not capture complex non-linear relationships between variables without appropriate transformations
Provides insights into the marginal effects of independent variables on the dependent variableRequires the assumption of homoscedasticity, meaning that the variance of errors is constant across all levels of the independent variables

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Data Analysis

Data Analysis – Process, Methods and Types

Correlation Analysis

Correlation Analysis – Types, Methods and...

Critical Analysis

Critical Analysis – Types, Examples and Writing...

Phenomenology

Phenomenology – Methods, Examples and Guide

Symmetric Histogram

Symmetric Histogram – Examples and Making Guide

Bimodal Histogram

Bimodal Histogram – Definition, Examples

Suggestions or feedback?

MIT News | Massachusetts Institute of Technology

  • Machine learning
  • Sustainability
  • Black holes
  • Classes and programs

Departments

  • Aeronautics and Astronautics
  • Brain and Cognitive Sciences
  • Architecture
  • Political Science
  • Mechanical Engineering

Centers, Labs, & Programs

  • Abdul Latif Jameel Poverty Action Lab (J-PAL)
  • Picower Institute for Learning and Memory
  • Lincoln Laboratory
  • School of Architecture + Planning
  • School of Engineering
  • School of Humanities, Arts, and Social Sciences
  • Sloan School of Management
  • School of Science
  • MIT Schwarzman College of Computing

Explained: Regression analysis

definition of regression analysis in research

Previous image Next image

Share this news article on:

Related links.

  • Department of Economics
  • Department of Mathematics
  • Explained: "Linear and nonlinear systems"

Related Topics

  • Mathematics

More MIT News

 Six people dressed in team T-shirts and jackets pose in front of a pond. They're holding a large blue rocket with gold star designs.

MIT team wins grand prize at NASA’s First Nations Launch High-Power Rocket Competition

Read full story →

At left, Mariya Grinberg stands in front of a whiteboard filled with text. At right, Nuh Gedik sits at a scientific instrument in the lab, surrounded by his mentees.

Nurturing success

David Trumper stands in front of a chalkboard, holding up a small cylindrical electric motor in each hand

For developing designers, there’s magic in 2.737 (Mechatronics)

Five square slices show glimpse of LLMs, and the final one is green with a thumbs up.

Study: Transparency is often lacking in datasets used to train large language models

Charalampos Sampalis wears a headset while looking at the camera

How MIT’s online resources provide a “highly motivating, even transformative experience”

A small model shows a wooden man in a sparse room, with dramatic lighting from the windows.

Students learn theater design through the power of play

  • More news on MIT News homepage →

Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA, USA

  • Map (opens in new window)
  • Events (opens in new window)
  • People (opens in new window)
  • Careers (opens in new window)
  • Accessibility
  • Social Media Hub
  • MIT on Facebook
  • MIT on YouTube
  • MIT on Instagram
  • Statistical Analysis
  • Biomedical Signal Processing

Regression Analysis

  • In book: A Concise Guide to Market Research (pp.193-233)

Marko Sarstedt at Ludwig-Maximilians-University of Munich

  • Ludwig-Maximilians-University of Munich

Erik Mooi at University of Melbourne

  • University of Melbourne

Abstract and Figures

The select cases dialog box

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations

Krzysztof Król

  • Tomasz Sidor
  • Anna Wiśniewska
  • Bartłomiej Bartnik

Jacqueline Żammit

  • Muhammad Mansoor Uz Zaman Siddiqui
  • Syed Amir Iqbal
  • Ali Zulqarnain
  • Adeel Tabassum
  • TECHNOVATION
  • Thomas Clauss

Tobias Kesting

  • Kissia Marie M. Baring
  • Carlo Jay O. Pagunan
  • Jonel Mark D. Sarno
  • Daffa Syah Alam

Rika Rokhana

  • Ronny Susetyoko
  • Yarina Ahmad

Siti zulaikha Mustapha

  • Shimaa Shazana Mohd Ali
  • Mohd Syaiful Nizam Abu Hassan
  • MULTIMED TOOLS APPL

Sandip Modak

  • Oleg Kichigin
  • Grigory Kulkaev

Natalia Mozaleva

  • Galina Nazarova
  • Chan Ching Siang

Patricia Rayappan

  • Connie R. Wanberg

Ruth Kanfer

  • Maria Rotundo
  • J MARKETING

Claes Fornell

  • Eugene W. Anderson
  • Barbara Everitt Bryant

William H Greene

  • J OPER RES SOC
  • Larry E. Toothaker
  • Leona S. Aiken
  • Stephen G. West
  • Samuel B. Green

R. Carter Hill

  • Andy P. Field

Jeremy Miles

  • Zoë C. Field

Joseph Franklin Hair

  • Rolph E. Anderson
  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

Cart

  • SUGGESTED TOPICS
  • The Magazine
  • Newsletters
  • Managing Yourself
  • Managing Teams
  • Work-life Balance
  • The Big Idea
  • Data & Visuals
  • Reading Lists
  • Case Selections
  • HBR Learning
  • Topic Feeds
  • Account Settings
  • Email Preferences

A Refresher on Regression Analysis

definition of regression analysis in research

Understanding one of the most important types of data analysis.

You probably know by now that whenever possible you should be making data-driven decisions at work . But do you know how to parse through all the data available to you? The good news is that you probably don’t need to do the number crunching yourself (hallelujah!) but you do need to correctly understand and interpret the analysis created by your colleagues. One of the most important types of data analysis is called regression analysis.

  • Amy Gallo is a contributing editor at Harvard Business Review, cohost of the Women at Work podcast , and the author of two books: Getting Along: How to Work with Anyone (Even Difficult People) and the HBR Guide to Dealing with Conflict . She writes and speaks about workplace dynamics. Watch her TEDx talk on conflict and follow her on LinkedIn . amyegallo

definition of regression analysis in research

Partner Center

  • Data Center
  • Applications
  • Open Source

Logo

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More .

Regression analysis is a widely used set of statistical analysis methods for gauging the true impact of various factors on specific facets of a business. These methods help data analysts better understand relationships between variables, make predictions, and decipher intricate patterns within data. Regression analysis enables better predictions and more informed decision-making by tapping into historical data to forecast future outcomes. It informs the highest levels of strategic decision-making at the world’s leading enterprises, enabling them to achieve successful outcomes at scale in virtually all domains and industries. In this article, we delve into the essence of regression analysis, exploring its mechanics, applications, various types, and the benefits it brings to the table for enterprises that invest in it.

What is Regression Analysis?

Enterprises have long sought the proverbial “secret sauce” to increasing revenue. While a definitive formula for boosting sales has yet to be discovered, powerful advances in statistics and data science have made it easier to grasp relationships between potentially influential factors and reported sales results and earnings.

In the world of data analytics and statistical modeling, regression analysis stands out for its versatility and predictive power. At its core, it involves modeling the relationship between one or more independent variables and a dependent variable—in essence, asking how changes in one correspond to changes in the other.

How Does Regression Analysis Work?

Regression analysis works by constructing a mathematical model that represents the relationships among the variables in question. This model is expressed as an equation that captures the expected influence of each independent variable on the dependent variable.

End-to-end, the regression analysis process consists of data collection and preparation, model selection, parameter estimation, and model evaluation.

Step 1: Data Collection and Preparation

The first step in regression analysis involves gathering and preparing the data. As with any data analytics, data quality is imperative—in this context, preparation includes identifying all dependent and independent variables, cleaning the data, handling missing values, and transforming variables as needed.

Step 2: Model Selection

In this step, the appropriate regression model is selected based on the nature of the data and the research question. For example, a simple linear regression is suitable when exploring a single predictor, while multiple linear regression is better for use cases with multiple predictors. Polynomial regression, logistic regression, and other specialized forms can be employed for various other use cases.

Step 3: Parameter Estimation

The next step is to estimate the model parameters. For linear regression, this involves finding the coefficients (slopes and intercepts) that best fit the data. This is more often accomplished using techniques like the least squares method, which minimizes the sum of squared differences between observed and predicted values.

Step 4: Model Evaluation

Model evaluation is critical for determining the model’s goodness of fit and predictive accuracy. This process involves assessing such metrics as the coefficient of determination (R-squared), mean squared error (MSE), and others. Visualization tools—scatter plots and residual plots, for example—can aid in understanding how well the model captures the data’s patterns.

Interpreting the Results of Regression Analysis

In order to be actionable, data must be transformed into information. In a similar sense, once the regression analysis has yielded results, they must be interpreted. This includes interpreting coefficients and significance, determining goodness of fit, and performing residual analysis.

Interpreting Coefficients and Significance

Interpreting regression coefficients is crucial for understanding the relationships between variables. A positive coefficient suggests a positive relationship; a negative coefficient suggests a negative relationship.

The significance of coefficients is determined through hypothesis testing—a common statistical method to determine if sample data contains sufficient evidence to draw conclusions—and represented by the p-value. The smaller the p-value, the more significant the relationship.

Determining Goodness of Fit

The coefficient of determination—denoted as R-squared—indicates the proportion of the variance in the dependent variable explained by the independent variables. A higher R-squared value suggests a better fit, but correlation doesn’t necessarily equal causation (i.e., a high R-squared doesn’t imply causation).

Performing Residual Analysis

Analyzing residuals helps validate the assumptions of regression analysis. In a well-fitting model, residuals are randomly scattered around zero. Patterns in residuals could indicate violations of assumptions or omitted variables that should be included in the model.

Key Assumptions of Regression Analysis

For regression analysis to yield reliable and meaningful results, regression analysis relies on assumptions of linearity, independence, homoscedasticity, normality, and no multicollinearity in interpreting and validating models.

  • Linearity. The relationship between independent and dependent variables is assumed to be linear. This means that the change in the dependent variable is directly proportional to changes in the independent variable(s).
  • Independence. The residuals—differences between observed and predicted values—should be independent of each other. In other words, the value of the residual for one data point should not provide information about the residual for another data point.
  • Homoscedasticity. The variance of residuals should remain consistent across all levels of the independent variables. If the variance of residuals changes systematically, it indicates heteroscedasticity and an unreliable regression model.
  • Normality. Residuals should follow a normal distribution. While this assumption is more crucial for smaller sample sizes, violations can impact the reliability of statistical inference and hypothesis testing in many scenarios.
  • No multicollinearity. Multicollinearity—a statistical phenomenon where several independent variables in a model are correlated—makes interpreting individual variable contributions difficult and may result in unreliable coefficient estimates. In multiple linear regression, independent variables should not be highly correlated.

Types of Regression Analysis

There are many regression analysis techniques available for different use cases. Simple linear regression and logistic regression are well-suited for most scenarios, but the following are some of the other most commonly used approaches.

Studies relationship between two variables (predictor and outcome)
Captures impact of all variables
Finds and represents complex patterns and non-linear relationships
Estimates probability based on predictor variables
Used in cases with high correlation between variables; can also be used as a regularization method for accuracy
Used to minimize effect of correlated variables on predictions

Common types of regression analysis.

Simple Linear Regression

Useful for exploring the relationship between two continuous variables in straightforward cause-and-effect investigations, simple linear regression is the most basic form of regression analysis. It involves studying the relationship between two variables: an independent variable (the predictor) and a dependent variable (the outcome).

Source: https://upload.wikimedia.org/wikipedia/commons/b/b0/Linear_least_squares_example2.svg

Multiple Linear Regression (MLR)

MLR regression extends the concept of simple linear regression by capturing the combined impact of all factors, allowing for a more comprehensive analysis of how several factors collectively influence the outcome.

definition of regression analysis in research

Source: https://cdn.corporatefinanceinstitute.com/assets/multiple-linear-regression.png

Polynomial Regression

For non-linear relationships, polynomial regression accommodates curves and enables accurate representation of complex patterns. This method involves fitting a polynomial equation to the data, allowing for more flexible modeling of complex relationships. For example, a second order polynomial regression—also known as a quadratic regression—can be used to capture a U-shaped or inverted U-shaped pattern in the data.

definition of regression analysis in research

Source: https://en.wikipedia.org/wiki/Polynomial_regression#/media/File:Polyreg_scheffe.svg

Logistic Regression

Logistic regression estimates the probability of an event occurring based on one or more predictor variables. In contrast to linear regression, logistic regression is designed to predict categorical outcomes, which are typically binary in nature—for example, yes/no or 0/1.

Source: https://en.m.wikipedia.org/wiki/File:Exam_pass_logistic_curve.svg

Ridge Regression

Ridge regression is typically employed when there is a high correlation between the independent variables. This powerful regression method yields models that are less susceptible to overfitting, and can be used as regularization methods for reducing the impact of correlated variables on model accuracy.

definition of regression analysis in research

Source: https://www.statology.org/ridge-regression-in-r/

Lasso Regression

Like ridge regression, lasso regression—short for least absolute shrinkage and selection operator—works by minimizing the effect that correlated variables have on a model’s predictive capabilities.

definition of regression analysis in research

Source: https://www.statology.org/lasso-regression-in-r/

Regression Analysis Benefits and Use Cases

Because it taps into historical data to forecast future outcomes, regression analysis enables better predictions and more informed decision-making, giving it tremendous value for enterprises in all fields. It’s used at the highest levels of the world’s leading enterprises in fields from finance to marketing to help achieve successful outcomes at scale.

For example, regression analysis plays a crucial role in the optimization of transportation and logistics operations. By predicting demand patterns, it allows enterprises to adjust inventory levels and optimize their supply chain management efforts. It can also help optimize routes by identifying factors that influence travel times and delivery delays, ultimately leading to more accurate scheduling and resource allocation, and assists in fleet management by predicting maintenance needs.

Here are other examples of how other industries use regression analysis:

  • Economics and finance. Regression models help economists understand the interplay of variables such as interest rates, inflation, and consumer spending, guiding monetary strategy and policy decisions and economic forecasts.
  • Healthcare. Medical researchers employ regression analysis to determine how factors like age, lifestyle choices, genetics, and environmental factors contribute to health outcomes to aid in the design of personalized treatment plans and mechanisms for predicting disease risks.
  • Marketing and business. Enterprises use regression analysis to understand consumer behavior, optimize pricing strategies, and evaluate marketing campaign effectiveness.

Challenges and Limitations

Despite its power, regression analysis is not without challenges and limitations. For example, overfitting occurs when a model is too complex and fits the noise in the data, rather than the underlying patterns, or multicollinearity can lead to unstable coefficient estimates.

To deal with these issues, methods such as correlation analysis, variance inflation factor (VIF), and principal component analysis (PCA) can be used to identify and remove redundant variables. Regularization methods using additional regression techniques—ridge regression, lasso regression, and elastic net regression, for example—can help to reduce the impact of correlated variables on the model’s accuracy.

Inherently, regression analysis methods assume that relationships are constant across all levels of the independent variables. But this assumption might not hold true in all cases. For example, modeling the relationship between an app’s ease-of-use and subscription renewal rate may not be well-represented by a linear model, as subscription renewals may increase exponentially or logarithmically with the level of usability.

Bottom Line: Regression Analysis for Enterprise Use

Regression analysis is an indispensable tool in the arsenal of data analysts and researchers. It allows for the decoding of hidden relationships, more accurate outcome predictions, and revelations hidden inside intricate data dynamics that can aid in strategic decision-making.

While it has limitations, many of them can be minimized with the use of other analytical methods. With a solid understanding of its mechanisms, types, and applications, enterprises across nearly all domains can harness its potential to extract valuable information.

Doing so requires investment—not just in the right data analytics and visualization tools and expertise, but in a commitment to collect and prepare high quality data and train staff to incorporate it into decision-making processes. Regression analysis should be just one of the arrows in a business’s data analytics and data management quiver.

Read about the 6 Essential Techniques for Data Mining to learn more about how enterprise data feeds regression analysis to make predictions.

Subscribe to Data Insider

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more.

Similar articles

8 best data analytics tools: gain data-driven advantage in 2024, common data visualization examples: transform numbers into narratives, what is data management a guide to systems, processes, and tools, get the free newsletter.

Subscribe to Data Insider for top news, trends & analysis

Latest Articles

Exploring multi-tenant architecture: a..., 8 best data analytics..., common data visualization examples:..., what is data management....

Logo

Regression analysis includes several variations, such as linear, multiple linear, and nonlinear. The most common models are simple linear and multiple linear. Nonlinear regression analysis is commonly used for more complicated data sets in which the dependent and independent variables show a nonlinear relationship.

Regression analysis offers numerous applications in various disciplines, including finance .

Regression Analysis – Linear Model Assumptions

Linear regression analysis is based on six fundamental assumptions:

  • The dependent and independent variables show a linear relationship between the slope and the intercept.
  • The independent variable is not random.
  • The value of the residual (error) is zero.
  • The value of the residual (error) is constant across all observations.
  • The value of the residual (error) is not correlated across all observations.
  • The residual (error) values follow the normal distribution.

Regression Analysis – Simple Linear Regression

Simple linear regression is a model that assesses the relationship between a dependent variable and an independent variable. The simple linear model is expressed using the following equation:

Y = a + bX + ϵ

  • Y – Dependent variable
  • X – Independent (explanatory) variable
  • a – Intercept
  • b – Slope
  • ϵ – Residual (error)

Check out the following video to learn more about simple linear regression:

Regression Analysis – Multiple Linear Regression

Multiple linear regression analysis is essentially similar to the simple linear model, with the exception that multiple independent variables are used in the model. The mathematical representation of multiple linear regression is:

Y = a + b X 1  + c X 2  + d X 3 + ϵ

  • X 1 , X 2 , X 3  – Independent (explanatory) variables
  • b, c, d – Slopes

Multiple linear regression follows the same conditions as the simple linear model. However, since there are several independent variables in multiple linear analysis, there is another mandatory condition for the model:

  • Non-collinearity: Independent variables should show a minimum correlation with each other. If the independent variables are highly correlated with each other, it will be difficult to assess the true relationships between the dependent and independent variables.

Regression analysis comes with several applications in finance. For example, the statistical method is fundamental to the Capital Asset Pricing Model (CAPM) . Essentially, the CAPM equation is a model that determines the relationship between the expected return of an asset and the market risk premium.

The analysis is also used to forecast the returns of securities, based on different factors, or to forecast the performance of a business. Learn more forecasting methods in CFI’s Budgeting and Forecasting Course !

1. Beta and CAPM

In finance, regression analysis is used to calculate the Beta (volatility of returns relative to the overall market) for a stock. It can be done in Excel using the Slope function .

Screenshot of Beta Calculator Template in Excel

Download CFI’s free beta calculator !

2. Forecasting Revenues and Expenses

When forecasting financial statements for a company, it may be useful to do a multiple regression analysis to determine how changes in certain assumptions or drivers of the business will impact revenue or expenses in the future. For example, there may be a very high correlation between the number of salespeople employed by a company, the number of stores they operate, and the revenue the business generates.

Simple Linear Regression - Forecasting Revenues and Expenses

The above example shows how to use the Forecast function in Excel to calculate a company’s revenue, based on the number of ads it runs.

Learn more forecasting methods in CFI’s Budgeting and Forecasting Course !

Excel remains a popular tool to conduct basic regression analysis in finance, however, there are many more advanced statistical tools that can be used.

Python and R are both powerful coding languages that have become popular for all types of financial modeling, including regression. These techniques form a core part of data science and machine learning, where models are trained to detect these relationships in data.

Learn more about regression analysis, Python, and Machine Learning in CFI’s Business Intelligence & Data Analysis certification.

To learn more about related topics, check out the following free CFI resources:

  • Cost Behavior Analysis
  • Forecasting Methods
  • Joseph Effect
  • Variance Inflation Factor (VIF)
  • High Low Method vs. Regression Analysis
  • See all data science resources
  • Share this article

Excel Fundamentals - Formulas for Finance

Create a free account to unlock this Template

Access and download collection of free Templates to help power your productivity and performance.

Already have an account? Log in

Supercharge your skills with Premium Templates

Take your learning and productivity to the next level with our Premium Templates.

Upgrading to a paid membership gives you access to our extensive collection of plug-and-play Templates designed to power your performance—as well as CFI's full course catalog and accredited Certification Programs.

Already have a Self-Study or Full-Immersion membership? Log in

Access Exclusive Templates

Gain unlimited access to more than 250 productivity Templates, CFI's full course catalog and accredited Certification Programs, hundreds of resources, expert reviews and support, the chance to work with real-world finance and research tools, and more.

Already have a Full-Immersion membership? Log in

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case AskWhy Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

definition of regression analysis in research

Home Market Research

Regression Analysis: Definition, Types, Usage & Advantages

definition of regression analysis in research

Regression analysis is perhaps one of the most widely used statistical methods for investigating or estimating the relationship between a set of independent and dependent variables. In statistical analysis , distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities.

It is also used as a blanket term for various data analysis techniques utilized in a qualitative research method for modeling and analyzing numerous variables. In the regression method, the dependent variable is a predictor or an explanatory element, and the dependent variable is the outcome or a response to a specific query.

LEARN ABOUT:   Statistical Analysis Methods

Content Index

Definition of Regression Analysis

Types of regression analysis, regression analysis usage in market research, how regression analysis derives insights from surveys, advantages of using regression analysis in an online survey.

Regression analysis is often used to model or analyze data. Most survey analysts use it to understand the relationship between the variables, which can be further utilized to predict the precise outcome.

For Example – Suppose a soft drink company wants to expand its manufacturing unit to a newer location. Before moving forward, the company wants to analyze its revenue generation model and the various factors that might impact it. Hence, the company conducts an online survey with a specific questionnaire.

After using regression analysis, it becomes easier for the company to analyze the survey results and understand the relationship between different variables like electricity and revenue – here, revenue is the dependent variable.

LEARN ABOUT: Level of Analysis

In addition, understanding the relationship between different independent variables like pricing, number of workers, and logistics with the revenue helps the company estimate the impact of varied factors on sales and profits.

Survey researchers often use this technique to examine and find a correlation between different variables of interest. It provides an opportunity to gauge the influence of different independent variables on a dependent variable.

Overall, regression analysis saves the survey researchers’ additional efforts in arranging several independent variables in tables and testing or calculating their effect on a dependent variable. Different types of analytical research methods are widely used to evaluate new business ideas and make informed decisions.

Create a Free Account

Researchers usually start by learning linear and logistic regression first. Due to the widespread knowledge of these two methods and ease of application, many analysts think there are only two types of models. Each model has its own specialty and ability to perform if specific conditions are met.

This blog explains the commonly used seven types of multiple regression analysis methods that can be used to interpret the enumerated data in various formats.

01. Linear Regression Analysis

It is one of the most widely known modeling techniques, as it is amongst the first elite regression analysis methods picked up by people at the time of learning predictive modeling. Here, the dependent variable is continuous, and the independent variable is more often continuous or discreet with a linear regression line.

Please note that multiple linear regression has more than one independent variable than simple linear regression. Thus, linear regression is best to be used only when there is a linear relationship between the independent and a dependent variable.

A business can use linear regression to measure the effectiveness of the marketing campaigns, pricing, and promotions on sales of a product. Suppose a company selling sports equipment wants to understand if the funds they have invested in the marketing and branding of their products have given them substantial returns or not.

Linear regression is the best statistical method to interpret the results. The best thing about linear regression is it also helps in analyzing the obscure impact of each marketing and branding activity, yet controlling the constituent’s potential to regulate the sales.

If the company is running two or more advertising campaigns simultaneously, one on television and two on radio, then linear regression can easily analyze the independent and combined influence of running these advertisements together.

LEARN ABOUT: Data Analytics Projects

02. Logistic Regression Analysis

Logistic regression is commonly used to determine the probability of event success and event failure. Logistic regression is used whenever the dependent variable is binary, like 0/1, True/False, or Yes/No. Thus, it can be said that logistic regression is used to analyze either the close-ended questions in a survey or the questions demanding numeric responses in a survey.

Please note logistic regression does not need a linear relationship between a dependent and an independent variable, just like linear regression. Logistic regression applies a non-linear log transformation for predicting the odds ratio; therefore, it easily handles various types of relationships between a dependent and an independent variable.

Logistic regression is widely used to analyze categorical data, particularly for binary response data in business data modeling. More often, logistic regression is used when the dependent variable is categorical, like to predict whether the health claim made by a person is real(1) or fraudulent, to understand if the tumor is malignant(1) or not.

Businesses use logistic regression to predict whether the consumers in a particular demographic will purchase their product or will buy from the competitors based on age, income, gender, race, state of residence, previous purchase, etc.

03. Polynomial Regression Analysis

Polynomial regression is commonly used to analyze curvilinear data when an independent variable’s power is more than 1. In this regression analysis method, the best-fit line is never a ‘straight line’ but always a ‘curve line’ fitting into the data points.

Please note that polynomial regression is better to use when two or more variables have exponents and a few do not.

Additionally, it can model non-linearly separable data offering the liberty to choose the exact exponent for each variable, and that too with full control over the modeling features available.

When combined with response surface analysis, polynomial regression is considered one of the sophisticated statistical methods commonly used in multisource feedback research. Polynomial regression is used mostly in finance and insurance-related industries where the relationship between dependent and independent variables is curvilinear.

Suppose a person wants to budget expense planning by determining how long it would take to earn a definitive sum. Polynomial regression, by taking into account his/her income and predicting expenses, can easily determine the precise time he/she needs to work to earn that specific sum amount.

04. Stepwise Regression Analysis

This is a semi-automated process with which a statistical model is built either by adding or removing the dependent variable on the t-statistics of their estimated coefficients.

If used properly, the stepwise regression will provide you with more powerful data at your fingertips than any method. It works well when you are working with a large number of independent variables. It just fine-tunes the unit of analysis model by poking variables randomly.

Stepwise regression analysis is recommended to be used when there are multiple independent variables, wherein the selection of independent variables is done automatically without human intervention.

Please note, in stepwise regression modeling, the variable is added or subtracted from the set of explanatory variables. The set of added or removed variables is chosen depending on the test statistics of the estimated coefficient.

Suppose you have a set of independent variables like age, weight, body surface area, duration of hypertension, basal pulse, and stress index based on which you want to analyze its impact on the blood pressure.

In stepwise regression, the best subset of the independent variable is automatically chosen; it either starts by choosing no variable to proceed further (as it adds one variable at a time) or starts with all variables in the model and proceeds backward (removes one variable at a time).

Thus, using regression analysis, you can calculate the impact of each or a group of variables on blood pressure.

05. Ridge Regression Analysis

Ridge regression is based on an ordinary least square method which is used to analyze multicollinearity data (data where independent variables are highly correlated). Collinearity can be explained as a near-linear relationship between variables.

Whenever there is multicollinearity, the estimates of least squares will be unbiased, but if the difference between them is larger, then it may be far away from the true value. However, ridge regression eliminates the standard errors by appending some degree of bias to the regression estimates with a motive to provide more reliable estimates.

If you want, you can also learn about Selection Bias through our blog.

Please note, Assumptions derived through the ridge regression are similar to the least squared regression, the only difference being the normality. Although the value of the coefficient is constricted in the ridge regression, it never reaches zero suggesting the inability to select variables.

Suppose you are crazy about two guitarists performing live at an event near you, and you go to watch their performance with a motive to find out who is a better guitarist. But when the performance starts, you notice that both are playing black-and-blue notes at the same time.

Is it possible to find out the best guitarist having the biggest impact on sound among them when they are both playing loud and fast? As both of them are playing different notes, it is substantially difficult to differentiate them, making it the best case of multicollinearity, which tends to increase the standard errors of the coefficients.

Ridge regression addresses multicollinearity in cases like these and includes bias or a shrinkage estimation to derive results.

06. Lasso Regression Analysis

Lasso (Least Absolute Shrinkage and Selection Operator) is similar to ridge regression; however, it uses an absolute value bias instead of the square bias used in ridge regression.

It was developed way back in 1989 as an alternative to the traditional least-squares estimate with the intention to deduce the majority of problems related to overfitting when the data has a large number of independent variables.

Lasso has the capability to perform both – selecting variables and regularizing them along with a soft threshold. Applying lasso regression makes it easier to derive a subset of predictors from minimizing prediction errors while analyzing a quantitative response.

Please note that regression coefficients reaching zero value after shrinkage are excluded from the lasso model. On the contrary, regression coefficients having more value than zero are strongly associated with the response variables, wherein the explanatory variables can be either quantitative, categorical, or both.

Suppose an automobile company wants to perform a research analysis on average fuel consumption by cars in the US. For samples, they chose 32 models of car and 10 features of automobile design – Number of cylinders, Displacement, Gross horsepower, Rear axle ratio, Weight, ¼ mile time, v/s engine, transmission, number of gears, and number of carburetors.

As you can see a correlation between the response variable mpg (miles per gallon) is extremely correlated to some variables like weight, displacement, number of cylinders, and horsepower. The problem can be analyzed by using the glmnet package in R and lasso regression for feature selection.

07. Elastic Net Regression Analysis

It is a mixture of ridge and lasso regression models trained with L1 and L2 norms. The elastic net brings about a grouping effect wherein strongly correlated predictors tend to be in/out of the model together. Using the elastic net regression model is recommended when the number of predictors is far greater than the number of observations.

Please note that the elastic net regression model came into existence as an option to the lasso regression model as lasso’s variable section was too much dependent on data, making it unstable. By using elastic net regression, statisticians became capable of over-bridging the penalties of ridge and lasso regression only to get the best out of both models.

A clinical research team having access to a microarray data set on leukemia (LEU) was interested in constructing a diagnostic rule based on the expression level of presented gene samples for predicting the type of leukemia. The data set they had, consisted of a large number of genes and a few samples.

Apart from that, they were given a specific set of samples to be used as training samples, out of which some were infected with type 1 leukemia (acute lymphoblastic leukemia) and some with type 2 leukemia (acute myeloid leukemia).

Model fitting and tuning parameter selection by tenfold CV were carried out on the training data. Then they compared the performance of those methods by computing their prediction mean-squared error on the test data to get the necessary results.

A market research survey focuses on three major matrices; Customer Satisfaction , Customer Loyalty , and Customer Advocacy . Remember, although these matrices tell us about customer health and intentions, they fail to tell us ways of improving the position. Therefore, an in-depth survey questionnaire intended to ask consumers the reason behind their dissatisfaction is definitely a way to gain practical insights.

However, it has been found that people often struggle to put forth their motivation or demotivation or describe their satisfaction or dissatisfaction. In addition to that, people always give undue importance to some rational factors, such as price, packaging, etc. Overall, it acts as a predictive analytic and forecasting tool in market research.

When used as a forecasting tool, regression analysis can determine an organization’s sales figures by taking into account external market data. A multinational company conducts a market research survey to understand the impact of various factors such as GDP (Gross Domestic Product), CPI (Consumer Price Index), and other similar factors on its revenue generation model.

Obviously, regression analysis in consideration of forecasted marketing indicators was used to predict a tentative revenue that will be generated in future quarters and even in future years. However, the more forward you go in the future, the data will become more unreliable, leaving a wide margin of error .

Case study of using regression analysis

A water purifier company wanted to understand the factors leading to brand favorability. The survey was the best medium for reaching out to existing and prospective customers. A large-scale consumer survey was planned, and a discreet questionnaire was prepared using the best survey tool .

A number of questions related to the brand, favorability, satisfaction, and probable dissatisfaction were effectively asked in the survey. After getting optimum responses to the survey, regression analysis was used to narrow down the top ten factors responsible for driving brand favorability.

All the ten attributes derived (mentioned in the image below) in one or the other way highlighted their importance in impacting the favorability of that specific water purifier brand.

Regression Analysis in Market Research

It is easy to run a regression analysis using Excel or SPSS, but while doing so, the importance of four numbers in interpreting the data must be understood.

The first two numbers out of the four numbers directly relate to the regression model itself.

  • F-Value: It helps in measuring the statistical significance of the survey model. Remember, an F-Value significantly less than 0.05 is considered to be more meaningful. Less than 0.05 F-Value ensures survey analysis output is not by chance.
  • R-Squared: This is the value wherein the independent variables try to explain the amount of movement by dependent variables. Considering the R-Squared value is 0.7, a tested independent variable can explain 70% of the dependent variable’s movement. It means the survey analysis output we will be getting is highly predictive in nature and can be considered accurate.

The other two numbers relate to each of the independent variables while interpreting regression analysis.

  • P-Value: Like F-Value, even the P-Value is statistically significant. Moreover, here it indicates how relevant and statistically significant the independent variable’s effect is. Once again, we are looking for a value of less than 0.05.
  • Interpretation: The fourth number relates to the coefficient achieved after measuring the impact of variables. For instance, we test multiple independent variables to get a coefficient. It tells us, ‘by what value the dependent variable is expected to increase when independent variables (which we are considering) increase by one when all other independent variables are stagnant at the same value.

In a few cases, the simple coefficient is replaced by a standardized coefficient demonstrating the contribution from each independent variable to move or bring about a change in the dependent variable.

01. Get access to predictive analytics

Do you know utilizing regression analysis to understand the outcome of a business survey is like having the power to unveil future opportunities and risks?

For example, after seeing a particular television advertisement slot, we can predict the exact number of businesses using that data to estimate a maximum bid for that slot. The finance and insurance industry as a whole depends a lot on regression analysis of survey data to identify trends and opportunities for more accurate planning and decision-making.

02. Enhance operational efficiency

Do you know businesses use regression analysis to optimize their business processes?

For example, before launching a new product line, businesses conduct consumer surveys to better understand the impact of various factors on the product’s production, packaging, distribution, and consumption.

A data-driven foresight helps eliminate the guesswork, hypothesis, and internal politics from decision-making. A deeper understanding of the areas impacting operational efficiencies and revenues leads to better business optimization.

03. Quantitative support for decision-making

Business surveys today generate a lot of data related to finance, revenue, operation, purchases, etc., and business owners are heavily dependent on various data analysis models to make informed business decisions.

For example, regression analysis helps enterprises to make informed strategic workforce decisions. Conducting and interpreting the outcome of employee surveys like Employee Engagement Surveys, Employee Satisfaction Surveys, Employer Improvement Surveys, Employee Exit Surveys, etc., boosts the understanding of the relationship between employees and the enterprise.

It also helps get a fair idea of certain issues impacting the organization’s working culture, working environment, and productivity. Furthermore, intelligent business-oriented interpretations reduce the huge pile of raw data into actionable information to make a more informed decision.

04. Prevent mistakes from happening due to intuitions

By knowing how to use regression analysis for interpreting survey results, one can easily provide factual support to management for making informed decisions. ; but do you know that it also helps in keeping out faults in the judgment?

For example, a mall manager thinks if he extends the closing time of the mall, then it will result in more sales. Regression analysis contradicts the belief that predicting increased revenue due to increased sales won’t support the increased operating expenses arising from longer working hours.

Regression analysis is a useful statistical method for modeling and comprehending the relationships between variables. It provides numerous advantages to various data types and interactions. Researchers and analysts may gain useful insights into the factors influencing a dependent variable and use the results to make informed decisions. 

With QuestionPro Research, you can improve the efficiency and accuracy of regression analysis by streamlining the data gathering, analysis, and reporting processes. The platform’s user-friendly interface and wide range of features make it a valuable tool for researchers and analysts conducting regression analysis as part of their research projects.

Sign up for the free trial today and let your research dreams fly!

LEARN MORE         FREE TRIAL

MORE LIKE THIS

Net Trust Score

Net Trust Score: Tool for Measuring Trust in Organization

Sep 2, 2024

definition of regression analysis in research

Why You Should Attend XDAY 2024

Aug 30, 2024

Alchemer vs Qualtrics

Alchemer vs Qualtrics: Find out which one you should choose

target population

Target Population: What It Is + Strategies for Targeting

Aug 29, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Tuesday CX Thoughts (TCXT)
  • Uncategorized
  • What’s Coming Up
  • Workforce Intelligence

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Cardiopulm Phys Ther J
  • v.20(3); 2009 Sep

Regression Analysis for Prediction: Understanding the Process

Phillip b palmer.

1 Hardin-Simmons University, Department of Physical Therapy, Abilene, TX

Dennis G O'Connell

2 Hardin-Simmons University, Department of Physical Therapy, Abilene, TX

Research related to cardiorespiratory fitness often uses regression analysis in order to predict cardiorespiratory status or future outcomes. Reading these studies can be tedious and difficult unless the reader has a thorough understanding of the processes used in the analysis. This feature seeks to “simplify” the process of regression analysis for prediction in order to help readers understand this type of study more easily. Examples of the use of this statistical technique are provided in order to facilitate better understanding.

INTRODUCTION

Graded, maximal exercise tests that directly measure maximum oxygen consumption (VO 2 max) are impractical in most physical therapy clinics because they require expensive equipment and personnel trained to administer the tests. Performing these tests in the clinic may also require medical supervision; as a result researchers have sought to develop exercise and non-exercise models that would allow clinicians to predict VO 2 max without having to perform direct measurement of oxygen uptake. In most cases, the investigators utilize regression analysis to develop their prediction models.

Regression analysis is a statistical technique for determining the relationship between a single dependent (criterion) variable and one or more independent (predictor) variables. The analysis yields a predicted value for the criterion resulting from a linear combination of the predictors. According to Pedhazur, 15 regression analysis has 2 uses in scientific literature: prediction, including classification, and explanation. The following provides a brief review of the use of regression analysis for prediction. Specific emphasis is given to the selection of the predictor variables (assessing model efficiency and accuracy) and cross-validation (assessing model stability). The discussion is not intended to be exhaustive. For a more thorough explanation of regression analysis, the reader is encouraged to consult one of many books written about this statistical technique (eg, Fox; 5 Kleinbaum, Kupper, & Muller; 12 Pedhazur; 15 and Weisberg 16 ). Examples of the use of regression analysis for prediction are drawn from a study by Bradshaw et al. 3 In this study, the researchers' stated purpose was to develop an equation for prediction of cardiorespiratory fitness (CRF) based on non-exercise (N-EX) data.

SELECTING THE CRITERION (OUTCOME MEASURE)

The first step in regression analysis is to determine the criterion variable. Pedhazur 15 suggests that the criterion have acceptable measurement qualities (ie, reliability and validity). Bradshaw et al 3 used VO 2 max as the criterion of choice for their model and measured it using a maximum graded exercise test (GXT) developed by George. 6 George 6 indicated that his protocol for testing compared favorably with the Bruce protocol in terms of predictive ability and had good test-retest reliability ( ICC = .98 –.99). The American College of Sports Medicine indicates that measurement of VO 2 max is the “gold standard” for measuring cardiorespiratory fitness. 1 These facts support that the criterion selected by Bradshaw et al 3 was appropriate and meets the requirements for acceptable reliability and validity.

SELECTING THE PREDICTORS: MODEL EFFICIENCY

Once the criterion has been selected, predictor variables should be identified (model selection). The aim of model selection is to minimize the number of predictors which account for the maximum variance in the criterion. 15 In other words, the most efficient model maximizes the value of the coefficient of determination ( R 2 ). This coefficient estimates the amount of variance in the criterion score accounted for by a linear combination of the predictor variables. The higher the value is for R 2 , the less error or unexplained variance and, therefore, the better prediction. R 2 is dependent on the multiple correlation coefficient ( R ), which describes the relationship between the observed and predicted criterion scores. If there is no difference between the predicted and observed scores, R equals 1.00. This represents a perfect prediction with no error and no unexplained variance ( R 2 = 1.00). When R equals 0.00, there is no relationship between the predictor(s) and the criterion and no variance in scores has been explained ( R 2 = 0.00). The chosen variables cannot predict the criterion. The goal of model selection is, as stated previously, to develop a model that results in the highest estimated value for R 2 .

According to Pedhazur, 15 the value of R is often overestimated. The reasons for this are beyond the scope of this discussion; however, the degree of overestimation is affected by sample size. The larger the ratio is between the number of predictors and subjects, the larger the overestimation. To account for this, sample sizes should be large and there should be 15 to 30 subjects per predictor. 11 , 15 Of course, the most effective way to determine optimal sample size is through statistical power analysis. 11 , 15

Another method of determining the best model for prediction is to test the significance of adding one or more variables to the model using the partial F-test . This process, which is further discussed by Kleinbaum, Kupper, and Muller, 12 allows for exclusion of predictors that do not contribute significantly to the prediction, allowing determination of the most efficient model of prediction. In general, the partial F-test is similar to the F-test used in analysis of variance. It assesses the statistical significance of the difference between values for R 2 derived from 2 or more prediction models using a subset of the variables from the original equation. For example, Bradshaw et al 3 indicated that all variables contributed significantly to their prediction. Though the researchers do not detail the procedure used, it is highly likely that different models were tested, excluding one or more variables, and the resulting values for R 2 assessed for statistical difference.

Although the techniques discussed above are useful in determining the most efficient model for prediction, theory must be considered in choosing the appropriate variables. Previous research should be examined and predictors selected for which a relationship between the criterion and predictors has been established. 12 , 15

It is clear that Bradshaw et al 3 relied on theory and previous research to determine the variables to use in their prediction equation. The 5 variables they chose for inclusion–gender, age, body mass index (BMI), perceived functional ability (PFA), and physical activity rating (PAR)–had been shown in previous studies to contribute to the prediction of VO 2 max (eg, Heil et al; 8 George, Stone, & Burkett 7 ). These 5 predictors accounted for 87% ( R = .93, R 2 = .87 ) of the variance in the predicted values for VO 2 max. Based on a ratio of 1:20 (predictor:sample size), this estimate of R , and thus R 2 , is not likely to be overestimated. The researchers used changes in the value of R 2 to determine whether to include or exclude these or other variables. They reported that removal of perceived functional ability (PFA) as a variable resulted in a decrease in R from .93 to .89. Without this variable, the remaining 4 predictors would account for only 79% of the variance in VO 2 max. The investigators did note that each predictor variable contributed significantly ( p < .05 ) to the prediction of VO 2 max (see above discussion related to the partial F-test).

ASSESSING ACCURACY OF THE PREDICTION

Assessing accuracy of the model is best accomplished by analyzing the standard error of estimate ( SEE ) and the percentage that the SEE represents of the predicted mean ( SEE % ). The SEE represents the degree to which the predicted scores vary from the observed scores on the criterion measure, similar to the standard deviation used in other statistical procedures. According to Jackson, 10 lower values of the SEE indicate greater accuracy in prediction. Comparison of the SEE for different models using the same sample allows for determination of the most accurate model to use for prediction. SEE % is calculated by dividing the SEE by the mean of the criterion ( SEE /mean criterion) and can be used to compare different models derived from different samples.

Bradshaw et al 3 report a SEE of 3.44 mL·kg −1 ·min −1 (approximately 1 MET) using all 5 variables in the equation (gender, age, BMI, PFA, PA-R). When the PFA variable is removed from the model, leaving only 4 variables for the prediction (gender, age, BMI, PA-R), the SEE increases to 4.20 mL·kg −1 ·min −1 . The increase in the error term indicates that the model excluding PFA is less accurate in predicting VO 2 max. This is confirmed by the decrease in the value for R (see discussion above). The researchers compare their model of prediction with that of George, Stone, and Burkett, 7 indicating that their model is as accurate. It is not advisable to compare models based on the SEE if the data were collected from different samples as they were in these 2 studies. That type of comparison should be made using SEE %. Bradshaw and colleagues 3 report SEE % for their model (8.62%), but do not report values from other models in making comparisons.

Some advocate the use of statistics derived from the predicted residual sum of squares ( PRESS ) as a means of selecting predictors. 2 , 4 , 16 These statistics are used more often in cross-validation of models and will be discussed in greater detail later.

ASSESSING STABILITY OF THE MODEL FOR PREDICTION

Once the most efficient and accurate model for prediction has been determined, it is prudent that the model be assessed for stability. A model, or equation, is said to be “stable” if it can be applied to different samples from the same population without losing the accuracy of the prediction. This is accomplished through cross-validation of the model. Cross-validation determines how well the prediction model developed using one sample performs in another sample from the same population. Several methods can be employed for cross-validation, including the use of 2 independent samples, split samples, and PRESS -related statistics developed from the same sample.

Using 2 independent samples involves random selection of 2 groups from the same population. One group becomes the “training” or “exploratory” group used for establishing the model of prediction. 5 The second group, the “confirmatory” or “validatory” group is used to assess the model for stability. The researcher compares R 2 values from the 2 groups and assessment of “shrinkage,” the difference between the two values for R 2 , is used as an indicator of model stability. There is no rule of thumb for interpreting the differences, but Kleinbaum, Kupper, and Muller 12 suggest that “shrinkage” values of less than 0.10 indicate a stable model. While preferable, the use of independent samples is rarely used due to cost considerations.

A similar technique of cross-validation uses split samples. Once the sample has been selected from the population, it is randomly divided into 2 subgroups. One subgroup becomes the “exploratory” group and the other is used as the “validatory” group. Again, values for R 2 are compared and model stability is assessed by calculating “shrinkage.”

Holiday, Ballard, and McKeown 9 advocate the use of PRESS-related statistics for cross-validation of regression models as a means of dealing with the problems of data-splitting. The PRESS method is a jackknife analysis that is used to address the issue of estimate bias associated with the use of small sample sizes. 13 In general, a jackknife analysis calculates the desired test statistic multiple times with individual cases omitted from the calculations. In the case of the PRESS method, residuals, or the differences between the actual values of the criterion for each individual and the predicted value using the formula derived with the individual's data removed from the prediction, are calculated. The PRESS statistic is the sum of the squares of the residuals derived from these calculations and is similar to the sum of squares for the error (SS error ) used in analysis of variance (ANOVA). Myers 14 discusses the use of the PRESS statistic and describes in detail how it is calculated. The reader is referred to this text and the article by Holiday, Ballard, and McKeown 9 for additional information.

Once determined, the PRESS statistic can be used to calculate a modified form of R 2 and the SEE . R 2 PRESS is calculated using the following formula: R 2 PRESS = 1 – [ PRESS / SS total ], where SS total equals the sum of squares for the original regression equation. 14 Standard error of the estimate for PRESS ( SEE PRESS ) is calculated as follows: SEE PRESS =, where n equals the number of individual cases. 14 The smaller the difference between the 2 values for R 2 and SEE , the more stable the model for prediction. Bradshaw et al 3 used this technique in their investigation. They reported a value for R 2 PRESS of .83, a decrease of .04 from R 2 for their prediction model. Using the standard set by Kleinbaum, Kupper, and Muller, 12 the model developed by these researchers would appear to have stability, meaning it could be used for prediction in samples from the same population. This is further supported by the small difference between the SEE and the SEE PRESS , 3.44 and 3.63 mL·kg −1 ·min −1 , respectively.

COMPARING TWO DIFFERENT PREDICTION MODELS

A comparison of 2 different models for prediction may help to clarify the use of regression analysis in prediction. Table ​ Table1 1 presents data from 2 studies and will be used in the following discussion.

Comparison of Two Non-exercise Models for Predicting CRF

VariablesHeil et al = 374Bradshaw et al = 100
Intercept36.58048.073
Gender (male = 1, female = 0)3.7066.178
Age (years)0.558−0.246
Age −7.81 E-3
Percent body fat−0.541
Body mass index (kg-m )−0.619
Activity code (0-7)1.347
Physical activity rating (0–10)0.671
Perceived functional abilty0.712
)
.88 (.77).93 (.87)
4.90·mL–kg ·min 3.44 mL·kg min
12.7%8.6%

As noted above, the first step is to select an appropriate criterion, or outcome measure. Bradshaw et al 3 selected VO 2 max as their criterion for measuring cardiorespiratory fitness. Heil et al 8 used VO 2 peak. These 2 measures are often considered to be the same, however, VO 2 peak assumes that conditions for measuring maximum oxygen consumption were not met. 17 It would be optimal to compare models based on the same criterion, but that is not essential, especially since both criteria measure cardiorespiratory fitness in much the same way.

The second step involves selection of variables for prediction. As can be seen in Table ​ Table1, 1 , both groups of investigators selected 5 variables to use in their model. The 5 variables selected by Bradshaw et al 3 provide a better prediction based on the values for R 2 (.87 and .77), indicating that their model accounts for more variance (87% versus 77%) in the prediction than the model of Heil et al. 8 It should also be noted that the SEE calculated in the Bradshaw 3 model (3.44 mL·kg −1 ·min −1 ) is less than that reported by Heil et al 8 (4.90 mL·kg −1 ·min −1 ). Remember, however, that comparison of the SEE should only be made when both models are developed using samples from the same population. Comparing predictions developed from different populations can be accomplished using the SEE% . Review of values for the SEE% in Table ​ Table1 1 would seem to indicate that the model developed by Bradshaw et al 3 is more accurate because the percentage of the mean value for VO 2 max represented by error is less than that reported by Heil et al. 8 In summary, the Bradshaw 3 model would appear to be more efficient, accounting for more variance in the prediction using the same number of variables. It would also appear to be more accurate based on comparison of the SEE% .

The 2 models cannot be compared based on stability of the models. Each set of researchers used different methods for cross-validation. Both models, however, appear to be relatively stable based on the data presented. A clinician can assume that either model would perform fairly well when applied to samples from the same populations as those used by the investigators.

The purpose of this brief review has been to demystify regression analysis for prediction by explaining it in simple terms and to demonstrate its use. When reviewing research articles in which regression analysis has been used for prediction, physical therapists should ensure that the: (1) criterion chosen for the study is appropriate and meets the standards for reliability and validity, (2) processes used by the investigators to assess both model efficiency and accuracy are appropriate, 3) predictors selected for use in the model are reasonable based on theory or previous research, and 4) investigators assessed model stability through a process of cross-validation, providing the opportunity for others to utilize the prediction model in different samples drawn from the same population.

  • All Categories
  • Statistical Analysis Software

What Is Regression Analysis? Types, Importance, and Benefits

definition of regression analysis in research

In this post

Regression analysis basics

  • How does regression analysis work?
  • Types of regression analysis

When is regression analysis used?

Benefits of regression analysis, applications of regression analysis, top statistical analysis software.

Businesses collect data to make better decisions.

But when you count on data for building strategies, simplifying processes, and improving customer experience, more than collecting it, you need to understand and analyze it to be able to draw valuable insights. Analyzing data helps you study what’s already happened and predict what may happen in the future. 

Data analysis has many components, and while some can be easy to understand and perform, others are rather complex. The good news is that many statistical analysis software offer meaningful insights from data in a few steps.

You have to understand the fundamentals before using or relying on a statistical program to give accurate results because even though generating results is easy, interpreting them is another ballgame. 

While interpreting data, considering the factors that affect the data becomes essential. Regression analysis helps you do just that. With the assistance of this statistical analysis method , you can find the most important and least important factors in any data set and understand how they relate. 

This guide covers the fundamentals of regression analysis, its process, benefits, and applications.

What is regression analysis? 

Regression analysis is a statistical process that helps assess the relationships between a dependent variable and one or more independent variables.

The primary purpose of regression analysis is to describe the relationship between variables, but it can also be used to:

  • Estimate the value of one variable using the known values of other variables.
  • Predict results and shifts in a variable based on its relationship with other variables. 
  • Control the influence of variables while exploring the relationship between variables.  

To understand regression analysis comprehensively, you must build foundational knowledge of the statistical concepts.

Regression analysis helps identify the factors that impact data insights. You can use it to understand which factors play a role in creating an outcome and how significant they are. These factors are called variables.

You need to grasp two main types of variables.

  • The main factor you're focusing on is the dependent variable . This variable is often measured as an outcome of analyses and depends on one or more other variables.
  • The factors or variables that impact your dependent variable are called independent variables . Variables like these are often altered for analysis. They’re also called explanatory variables or predictor variables.

Correlation vs. causation 

Causation indicates that one variable is the result of the occurrence of the other variable. Correlation suggests a connection between variables. Correlation and causation can coexist, but correlation does not imply causation. 

Overfitting

Overfitting is a statistical modeling error that occurs when a function lines up with a limited set of data points and makes predictions based on those instead of exploring new data points. As a result, the model can only be used as a reference to its initial data set and not to any other data sets.

Want to learn more about Statistical Analysis Software? Explore Statistical Analysis products.

How does regression analysis work .

For a minute, let's imagine that you own an ice cream stand. In this case, we can consider “revenue” and “temperature” to be the two factors under analysis. The first step toward conducting a successful regression statistical analysis is gathering data on the variables. 

You collect all your monthly sales numbers for the past two years and any data on the independent variables or explanatory variables you’re analyzing. In this case, it’s the average monthly temperature for the past two years.

To begin to understand whether there’s a relationship between these two variables, you need to plot these data points on a graph that looks like the following theoretical example of a scatter plot:

scatter plot for regression analysis

The amount of sales is represented on the y-axis (vertical axis), and temperature is represented on the x-axis (horizontal axis). The dots represent one month's data – the average temperature and sales in that same month.

Observing this data shows that sales are higher on days when the temperature increases. But by how much? If the temperature goes higher, how much do you sell? And what if the temperature drops? 

Drawing a regression line roughly in the middle of all the data points helps you figure out how much you typically sell when it’s a specific temperature. Let’s use a theoretical scatter plot to depict a regression line: 

How regression analysis works

The regression line explains the relationship between the predicted values and dependent variables. It can be created using statistical analysis software or Microsoft Excel. 

Your regression analysis tool must also display a formula that defines the slope of the line. For example: 

y = 100 + 2x + error term

On observing the formula, you can conclude that when there is no x , y equals 100, which means that when the temperature is very low, you can make an average of 100 sales. Provided the other variables remain constant, you can use this to predict the future of sales. For every rise in the temperature, you make an average of two more sales.

A regression line always has an error term because an independent variable cannot be a perfect predictor of a dependent variable. Deciding whether this variable is worth your attention depends on the error term – the larger the error term, the less certain the regression line. 

Types of regression analysis 

Various types of regression analysis are at your disposal, but the five mentioned below are the most commonly used.

Linear regression

A linear regression model is defined as a straight line that attempts to predict the relationship between variables. It’s mainly classified into two types: simple and multiple linear regression. 

We’ll discuss those in a moment, but let’s first cover the five fundamental assumptions made in the linear regression model. 

  • The dependent and independent variables display a linear relationship.
  • The value of the residual is zero.
  • The value of the residual is constant and not correlated across all observations.
  • The residual is normally distributed.
  • Residual errors are homoscedastic – they have a constant variance.

Simple linear regression analysis 

Linear regression analysis helps predict a variable's value (dependent variable) based on the known value of one other variable (independent variable).

Linear regression fits a straight line, so a simple linear model attempts to define the relationship between two variables by estimating the coefficients of the linear equation.

Simple linear regression equation:

Y = a + bX + ϵ

Where, Y – Dependent variable (response variable) X – Independent variable (predictor variable) a – Intercept (y-intercept) b – Slope ϵ – Residual (error)

I n such a linear regression model, a response variable has a single corresponding predictor variable that impacts its value. For example, consider the linear regression formula:

  y = 5x + 4  

If the value of x is defined as 3, only one possible outcome of y is possible.

Multiple linear regression analysis

In most cases, simple linear regression analysis can't explain the connections between data. As the connection becomes more complex, the relationship between data is better explained using more than one variable. 

Multiple regression analysis describes a response variable using more than one predictor variable. It is used when a strong correlation between each independent variable has the ability to affect the dependent variable. 

Multiple linear regression equation: 

Y = a + bX1 + cX2 + dX3 + ϵ

Where, Y – Dependent variable X1, X2, X3 – Independent variables a – Intercept (y-intercept) b, c, d – Slopes ϵ – Residual (error)

Ordinary least squares

Ordinary Least Squares regression estimates the unknown parameters in a model. It estimates the coefficients of a linear regression equation by minimizing the sum of the squared errors between the actual and predicted values configured as a straight line.

Polynomial regression

A linear regression algorithm only works when the relationship between the data is linear. What if the data distribution was more complex, as shown in the figure below?  

Simple linear model

As seen above, the data is nonlinear. A linear model can't be used to fit nonlinear data because it can't sufficiently define the patterns in the data.

Polynomial regression is a type of multiple linear regression used when data points are present in a nonlinear manner. It can determine the curvilinear relationship between independent and dependent variables having a nonlinear relationship.

Polynomial model

Polynomial regression equation: 

y = b0+b1x1+ b2x1^2+ b2x1^3+...... bnx1^n

Logistic regression

Logistic regression models the probability of a dependent variable as a function of independent variables. The values of a dependent variable can take one of a limited set of binary values (0 and 1) since the outcome is a probability. 

Logistic regression is often used when binary data (yes or no; pass or fail) needs to be analyzed. In other words, using the logistic regression method to analyze your data is recommended if your dependent variable can have either one of two binary values.

Let’s say you need to determine whether an email is spam. We need to set up a threshold based on which the classification can be done. Using logistic regression here makes sense as the outcome is strictly bound to 0 (spam) or 1 (not spam) values.  

Bayesian linear regression

In other regression methods, the output is derived from one or more attributes. But what if those attributes are unavailable? 

The bayesian regression method is used when the dataset that needs to be analyzed has less or poorly distributed data because its output is derived from a probability distribution instead of point estimates. When data is absent, you can place a prior on the regression coefficients to substitute the data. As we add more data points, the accuracy of the regression model improves. 

Imagine a company launches a new product and wants to predict its sales. Due to the lack of available data, we can’t use a simple regression analysis model. But Bayesian regression analysis lets you set up a prior and calculate future projections.

Additionally, once new data from the new product sales come in, the prior is immediately updated. As a result, the forecast for the future is influenced by the latest and previous data. 

The Bayesian technique is mathematically robust. Because of this, it doesn’t require you to have any prior knowledge of the dataset during usage. However, its complexity means it takes time to draw inferences from the model, and using it doesn't make sense when you have too much data.

Quantile regression analysis

The linear regression method estimates a variable's mean based on the values of other predictor variables. But we don’t always need to calculate the conditional mean. In most situations, we only need the median, the 0.25 quantile, and so on. In cases like this, we can use quantile regression. 

Quantile regression defines the relationship between one or more predictor variables and specific percentiles or quantiles of a response variable. It resists the influence of outlying observations. No assumptions about the distribution of the dependent variable are made in quantile regression, so you can use it when linear regression doesn’t satisfy its assumptions. 

Let's consider two students who have taken an Olympiad exam open for all age groups. Student A scored 650, while student B scored 425. This data shows that student A has performed better than student B. 

But quantile regression helps remind us that since the exam was open for all age groups, we have to factor in r the student's age to determine the correct outcome in their individual conditional quantile spaces. 

We know the variable causing such a difference in the data distribution. As a result, the scores of the students are compared for the same age groups.

What is regularization? 

Regularization is a technique that prevents a regression model from overfitting by including extra information. It’s implemented by adding a penalty term to the data model. It allows you to keep the same number of features by reducing the magnitude of the variables. It reduces the magnitude of the coefficient of features toward zero.

The two types of regularization techniques are L1 and L2. A regression model using the L1 regularization technique is known as Lasso regression, and the one using the L2 regularization technique is called Ridge regression.

Ridge regression

Ridge regression is a regularization technique you would use to eliminate the correlations between independent variables (multicollinearity) or when the number of independent variables in a set exceeds the number of observations. 

Ridge regression performs L2 regularization. In such a regularization, the formula used to make predictions is the same for ordinary least squares, but a penalty is added to the square of the magnitude of regression coefficients. This is done so that each feature has as little effect on the outcome as possible. 

Lasso regression

Lasso stands for Least Absolute Shrinkage and Selection Operator. 

Lasso regression is a regularized linear regression that uses an L1 penalty that pushes some regression coefficient values to become closer to zero. By setting features to zero, it automatically chooses the required feature and avoids overfitting.

So if the dataset has high correlation, high levels of multicollinearity, or when specific features such as variable selection or parameter elimination need to be automated, you can use lasso regression.

Now is the time to get SaaS-y news and entertainment with our 5-minute newsletter, G2 Tea , featuring inspiring leaders, hot takes, and bold predictions. Subscribe today!

g2 tea cta

Regression analysis is a powerful tool used to derive statistical inferences for the future using observations from the past . It identifies the connections between variables occurring in a dataset and determines the magnitude of these associations and their significance on outcomes.

Across industries, it’s a useful statistical analysis tool because it provides exceptional flexibility. So the next time someone at work proposes a plan that depends on multiple factors, perform a regression analysis to predict an accurate outcome. 

In the real world, various factors determine how a business grows. Often these factors are interrelated, and a change in one can positively or negatively affect the other. 

Using regression analysis to judge how changing variables will affect your business has two primary benefits.

  • Making data-driven decisions: Businesses use regression analysis when planning for the future because it helps determine which variables have the most significant impact on the outcome according to previous results. Companies can better focus on the right things when forecasting and making data-backed predictions.
  • Recognizing opportunities to improve: Since regression analysis shows the relations between two variables, businesses can use it to identify areas of improvement in terms of people, strategies, or tools by observing their interactions. For example, increasing the number of people on a project might positively impact revenue growth . 

Both small and large industries are loaded with an enormous amount of data. To make better decisions and eliminate guesswork, many are now adopting regression analysis because it offers a scientific approach to management.

Using regression analysis, professionals can observe and evaluate the relationship between various variables and subsequently predict this relationship's future characteristics. 

Companies can utilize regression analysis in numerous forms. Some of them:

  • Many finance professionals use regression analysis to forecast future opportunities and risks . The capital asset pricing model (CAPM) that decides the relationship between an asset's expected return and the associated market risk premium is an often-used regression model in finance for pricing assets and discovering capital costs. Regression analysis is also used to calculate beta (β), which is described as the volatility of returns while considering the overall market for a stock.
  • Insurance firms use regression analysis to forecast the creditworthiness of a policyholder . It can also help choose the number of claims that may be raised in a specific period.
  • Sales forecasting uses regression analysis to predict sales based on past performance. It can give you a sense of what has worked before, what kind of impact it has created, and what can improve to provide more accurate and beneficial future results. 
  • Another critical use of regression models is the optimization of business processes . Today, managers consider regression an indispensable tool for highlighting the areas that have the maximum impact on operational efficiency and revenues, deriving new insights, and correcting process errors. 

Businesses with a data-driven culture use regression analysis to draw actionable insights from large datasets. For many leading industries with extensive data catalogs, it proves to be a valuable asset. As the data size increases, further executives lean into regression analysis to make informed business decisions with statistical significance. 

While Microsoft Excel remains a popular tool for conducting fundamental regression data analysis, many more advanced statistical tools today drive more accurate and faster results. Check out the top statistical analysis software in 2023 here. 

To be included in this category, the regression analysis software product must be able to:

  • Execute a simple linear regression or a complex multiple regression analysis for various data sets.
  • Provide graphical tools to study model estimation, multicollinearity, model fits, line of best fit, and other aspects typical of the type of regression.
  • Possess a clean, intuitive, and user-friendly user interface (UI) design

*Below are the top 5 leading statistical analysis software solutions from G2’s Winter 2023 Grid® Report. Some reviews may be edited for clarity.

1. IBM SPSS statistics

IBM SPSS Statistics allows you to predict the outcomes and apply various nonlinear regression procedures that can be used for business and analysis projects where standard regression techniques are limiting or inappropriate. With IBM SPSS Statistics, you can specify multiple regression models in a single command to observe the correlation between independent and dependent variables and expand regression analysis capabilities on a dataset.

What users like best :

"I have used a couple of different statistical softwares. IBM SPSS is an amazing software, a one-stop shop for all statistics-related analysis. The graphical user interface is elegantly built for ease. I was quickly able to learn and use it"

- IBM SPSS Statistics Review , Haince Denis P.

What users dislike:

"Some of the interfaces could be more intuitive. Thankfully much information is available from various sources online to help the user learn how to set up tests."

- IBM SPSS Statistics Review , David I.

To make data science more intuitive and collaborative, Posit provides users across key industries with R and Python-based tools, enabling them to leverage powerful analytics and gather valuable insights.

What users like best:

"Straightforward syntax, excellent built-in functions, and powerful libraries for everything else. Building anything from simple mathematical functions to complicated machine learning models is a breeze."

- Posit Review , Brodie G.

"Its GUI could be more intuitive and user-friendly. One needs a lot of time to understand and implement it. Including a package manager would be a good idea, as it has become common in many modern IDEs. There must be an option to save console commands, which is currently unavailable."

- Posit Review , Tanishq G.

JMP is a data analysis software that helps make sense of your data using cutting-edge and modern statistical methods. Its products are intuitively interactive, visually compelling, and statistically profound. 

"The instructional videos on the website are great; I had no clue what I was doing before I watched them. The videos make the application very user-friendly."

- JMP Review , Ashanti B.

"Help function can be brief in terms of what the functionality entails, and that's disappointing because the way the software is set up to communicate data visually and intuitively suggests the presence of a logical and explainable scientific thought process, including an explanation of the "why.” The graph builder could also use more intuitive means to change layout features."

- JMP Review , Zeban K.

4. Minitab statistical software

Minitab Statistical Software is a data and statistical analysis tool used to help businesses understand their data and make better decisions. It allows companies to tap into the power of regression analysis by analyzing new and old data to discover trends, predict patterns, uncover hidden relationships between variables, and create stunning visualizations. 

"The greatest program for learning and analyzing as it allows you to improve the settings with incredibly accurate graphs and regression charts. This platform allows you to analyze the outcomes or data with their ideal values."

- Minitab Statistical Software Review , Pratibha M.

"The software price is steep, and licensing is troublesome. You are required to be online or connected to the company VPN for licensing, especially for corporate use. So without an internet connection, you cannot use it at all. Also, if you are in the middle of doing an analysis and happen to lose your internet connection, you will risk losing the project or the study you are working on."

- Minitab Statistical Software Review , Siew Kheong W.

EViews offers user-friendly tools to perform data modeling and forecasting. It operates with an innovative, easy-to-use object-oriented interface used by researchers, financial institutions, government agencies, and educators.

"As an economist, this software is handy as it assists me in conducting advanced research, analyzing data, and interpreting results for policy recommendations. I just cannot do without EViews. I like its recent updates that have also enhanced the UI."

- EViews Review , T homas M.

"In my experience, importing data from Excel is not easy using EViews compared to other statistical software. One needs to develop expertise while importing data into EViews from different formats. Moreover, the price of the software is very high."

 - EViews Review , Md. Zahid H.

Click to chat with G2s Monty-AI

Collecting data gathers no moss.

Data collection has become easy in the modern world, but more than just gathering is required. Businesses must know how to get the most value from this data. Analysis helps companies to understand the available information, derive actionable insights, and make informed decisions. Businesses should thoroughly know the data analysis process inside and out to refine operations, improve customer service, and track performance. 

Learn more ab out the various stages of data analysis and implement them to drive success. 

Devyani Mehta

Devyani Mehta is a content marketing specialist at G2. She has worked with several SaaS startups in India, which has helped her gain diverse industry experience. At G2, she shares her insights on complex cybersecurity concepts like web application firewalls, RASP, and SSPM. Outside work, she enjoys traveling, cafe hopping, and volunteering in the education sector. Connect with her on LinkedIn.

Explore More G2 Articles

Statistical analysis software

Regression Analysis

  • Reference work entry
  • pp 1831–1832
  • Cite this reference work entry

definition of regression analysis in research

  • Zhi-Ping Liu 5  

132 Accesses

Regression model

Regression analysis is a statistical method for investigating the relationships between variables, which includes a number of techniques for modeling and analyzing several variables. The focus is on the relationship between a dependent variable and one or more independent variables (Sen and Srivastava 1990 ). Regression analysis can present how the typical value of the dependent variable changes when some of the independent variables are varied, while the other independent variables are held fixed (Hardle and Simar 2003 ).

Generally, there are two types of regression analysis according to whether the data approximate a linear function, i.e., linear regression and nonlinear regression. One very general form of the regression model is

where \( f \) is some unknown function and \( \epsilon \) is the error in the representation. To carry out regression analysis, the form of the function \(\;\; f \) ...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Andercut M, Kauffman SA (2008) On the sparse reconstruction of gene networks. J Comput Biol 15(1):21–30

Article   CAS   Google Scholar  

Hardle W, Simar L (2003) Applied multivariate statistical analysis. Springer, New York

Book   Google Scholar  

Hartemink AJ (2005) Reverse engineering gene regulatory networks. Nat Biotechnol 23:554–555

Article   PubMed   CAS   Google Scholar  

Hosmer DW, Lemeshow S (2000) Applied logistic regression. Wiley, New York

Marbach D, Prill RJ, Schaffter T, Mattiussi C, Floreano D, Stolovitzky G (2010) Revealing strengths and weaknesses of methods for gene network inference. Proc Natl Acad Sci USA 107:6286–6291

Sen AK, Srivastava MS (1990) Regression analysis. Springer, New York

Tegner J, Yeung MK, Hasty J, Collins JJ (2003) Reverse engineering gene networks: integrating genetic perturbations with dynamical modeling. Proc Natl Acad Sci USA 100:5944–5949

Wang Y, Joshi T, Zhang XS, Xu D, Chen L (2006) Inferring gene regulatory networks from multiple microarray datasets. Bioinformatics 22:2413–2420

Yeung MK, Tegner J, Collins JJ (2002) Reverse engineering gene networks using singular value decomposition and robust regression. Proc Natl Acad Sci USA 99:6163–6168

Download references

Author information

Authors and affiliations.

Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, No. 500 Caobao Road, 5, 200031, Shanghai, China

Dr. Zhi-Ping Liu

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Zhi-Ping Liu .

Editor information

Editors and affiliations.

Biomedical Sciences Research Institute, University of Ulster, Coleraine, UK

Werner Dubitzky

Department of Computer Science, University of Rostock, Rostock, Germany

Olaf Wolkenhauer

Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea

Kwang-Hyun Cho

Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA

Hiroki Yokota

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media, LLC

About this entry

Cite this entry.

Liu, ZP. (2013). Regression Analysis. In: Dubitzky, W., Wolkenhauer, O., Cho, KH., Yokota, H. (eds) Encyclopedia of Systems Biology. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9863-7_397

Download citation

DOI : https://doi.org/10.1007/978-1-4419-9863-7_397

Publisher Name : Springer, New York, NY

Print ISBN : 978-1-4419-9862-0

Online ISBN : 978-1-4419-9863-7

eBook Packages : Biomedical and Life Sciences Reference Module Biomedical and Life Sciences

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Market Business News

What is regression analysis? Definition and examples

Regression analysis , in statistical modeling, is a way of mathematically sorting out a series of variables. We use it to determine which variables have an impact and how they relate to one another. In other words, regression analysis helps us determine which factors matter most and which we can ignore.

It also helps us determine which factors interact with each other. Furthermore, and most importantly, it helps us find out how certain we are about all the factors we are examining.

Goodness of fit , for example, is a component of regression analysis. Goodness of fit refers to how accurate expected values of a financial model are versus their actual values.

Regression analysis – a statistical measure

Regression analysis is a statistical measure that we use in investing, finance, sales, marketing, science, mathematics, etc. It tries to determine how strongly related one dependent variable is to a series of other changing variables. We usually refer to them as independent variables .

The dependent variable is the one that we focus on. Put simply, we want to know whether it is being affected, and if so, by how much, and by what.

Independent variables are the factors that may or may not affect the dependent variable. Dependent  receives the impact, while Independent  provides (or not) the impact.

Financial and investment managers say that it helps them value assets. It also helps them understand the relationships between different variables. For example, how the price of commodities relates to the shares of companies that deal in those commodities.

Regression_Analysis

Regression analysis in sales

Imagine you are a sales manager and you are trying to predict next month’s figures. You know that there are dozens that can impact the number. For example, the time of year or rumors that a better model is coming out soon can impact the number. In fact, there may be hundreds of factors.

Maybe work colleagues add their own variables to the mix. They might say, for example, that when it snows the company sells more. Others, on the other hand, may comment that sales take a nosedive about six weeks after a competitor’s promotion.

Regression analysis helps us determine which factors really matter and their relationships. It also helps us find out what their effects are on sales figures.

We call all these factors variables . There is a dependent variable , i.e., the main factor that we are trying to predict or understand. In your case as the sales manager, the dependent variable is monthly sales.

Regression_Analysis_Sales

There are also independent variables ; these are other factors which you believe may potentially have an impact on the dependent variable.

For your regression analysis, you have to gather all the information on the variables. You collect all data on your monthly sales numbers for the past quarter, half year, year, or three years. You also gather any data on the independent variables that you want to consider.

Regression analysis – example

For example, if you think snow might impact sales, you will need snowfall data for the past three years. You then plot all that information on a graph.

In an article published in the Harvard Business Review in November 2015, – A Refresher on Regression Analysis – Amy Gallo wrote :

“Most companies use regression analysis to explain a phenomenon they want to understand (e.g. why did customer service calls drop last month?); predict things about the future (e.g. what will sales look like over the next six months?); or to decide what to do (e.g. should we go with this promotion or a different one?).”

Regressive Analysis - Relationships

According to BusinessDictionary.com , regression analysis (RA) by definition is:

“Statistical approach to forecasting change in a dependent variable (sales revenue, for example) on the basis of change in one or more independent variables (population and income, for example).”

“Known also as curve fitting or line fitting because a regression analysis equation can be used in fitting a curve or line to data points, in a manner such that the differences in the distances of data points from the curve or line are minimized.”

History of regression analysis

French mathematician Adrien-Marie Legendre (1752-1833) published the earliest form of regression that we know of in 1805. German mathematician Johann Carl Friedrich Gauss (1777-1855) also published a piece in 1809.

Both mathematicians wrote about the ‘ method of the least squares .’ The method of the least squares is a standard approach in regression analysis when there are more equations than unknowns.

Gauss and Legendre applied the method to the problem of finding out what the orbits were of various celestial bodies. They focused mainly on the orbits of comets around the Sun.

In 1821, Gaus published an additional development to the theory of least squares in 1821. He included a version of what we call the Gauss-Markov theorem .

Sir Francis Galton (1922-1911), a British statistician, coined the term Regression Analysis in the 19th century. He used the term when describing people’s heights through generations. His study showed that the heights of descendants of very tall ancestors tended to move downward towards a normal average. In fact, we call this a  regression toward the mean .

Galton believed that regression was only applicable when he used it to describe the biological phenomenon that he had discovered.

However, Karl Pearson (1857-1936) and George Udny Yule (1871-1951) extended his work to a more general statistical context.

By the middle of the 20th century, economists were using electromechanical desk calculators for regression analysis calculations. Up to 1970, it could take up to twenty-four hours to obtain the result from one regression.

Today, people are still actively researching regression methods. Over the past few decades, statisticians have developed new methods for:

Robust Regression

This is regression involving responses that correlate, such as growth curves and time series.

More Complex Regression

This includes regression in which the independent variable (the predictor) or response variables are images, curves, or graphs.

Methods that Address Data Problems

Examples include Bayesian methods for regression, non-parametric regression, regression with a greater number of predictor variables than observation. Other examples include regression in which the predictor variables are incorrectly measured and causal inference with regression.

Video – Regression Analysis

In this Statistics is Fun video , the tutor explains what regression analysis is using simple language and easy-to-understand examples.

Share this:

  • Renewable Energy
  • Artificial Intelligence
  • 3D Printing
  • Financial Glossary

Copyright © SurveySparrow Inc. 2024 Privacy Policy Terms of Service SurveySparrow Inc.

What is Regression Analysis? Definition, Types, and Examples

blog author

Kate Williams

Last Updated: 22 January 2024

14 min read

What is Regression Analysis? Definition, Types, and Examples

Table Of Contents

  • Regression Analysis Definition
  • Regression Analysis FAQs
  • Regression Analysis: Importance
  • Types of Regression Analysis
  • Uses By Businesses
  • Regression Analysis Use Cases

If you want to find data trends or predict sales based on certain variables, then regression analysis is the way to go.

In this article, we will learn about regression analysis, types of regression analysis, business applications, and its use cases. Feel free to jump to a section that’s relevant to you.

  • What is the definition of regression analysis?
  • Regression analysis: FAQs
  • Why is regression analysis important?
  • Types of regression analysis and when to use them
  • How is regression analysis used by businesses
  • Use cases of regression analysis

What is Regression Analysis?

Need a quick regression definition? In simple terms, regression analysis identifies the variables that have an impact on another variable .

The regression model is primarily used in finance, investing, and other areas to determine the strength and character of the relationship between one dependent variable and a series of other variables.

Regression Analysis: FAQs

Let us look at some of the most commonly asked questions about regression analysis before we head deep into understanding everything about the regression method.

1. What is multiple regression analysis meaning?

Multiple regression analysis is a statistical method that is used to predict the value of a dependent variable based on the values of two or more independent variables.

2. In regression analysis, what is the predictor variable called?

The predictor variable is the name given to an independent variable that we use in regression analysis.

The predictor variable provides information about an associated dependent variable regarding a certain outcome. At their core, predictor variables are those that are linked with particular outcomes.

3. What is a residual plot in a regression analysis?

A residual plot is a graph that shows the residuals on the vertical axis and the independent variable on the horizontal axis.

Moreover, the residual plot is a representation of how close each data point is (vertically) from the graph of the prediction equation of the regression model. If the data point is above or below the graph of the prediction equation of the model, then it is supposed to fit the data.

4. What is linear regression analysis?

Linear regression analysis is used to predict the value of a variable based on the value of another variable. The variable that you want to predict is referred to as the dependent variable. The variable that you are using to predict the other value is called the independent variable.

Easily estimate and interpret linear regression models with survey data by SurveySparrow . Get a feel for our tool with a free account . Sign up below.

14-day free trial • Cancel Anytime • No Credit Card Required • No Strings Attached

Why is Regression Analysis Important?

There are many business applications of regression analysis.

  • For any machine learning problem which involves continuous numbers , regression analysis is essential. Some of those instances could be:
  • Testing automobiles
  • Weather analysis, and prediction
  • Sales and promotions forecasting
  • Financial forecasting
  • Time series forecasting
  • Regression analysis data also helps you understand whether the relationship between two different variables can give way to potential business opportunities .
  • For example, if you change one variable (say delivery speed), regression analysis will tell you the kind of effect that it has on other variables (such as customer satisfaction, small value orders, etc).
  • One of the best ways to solve regression issues in machine learning using a data model is through regression analysis. Plotting points on a chart, and running the best fit line , helps predict the possibility of errors.
  • The insights from these patterns help businesses to see the kind of difference that it makes to their bottom line .

5 Types of Regression Analysis and When to Use Them

1. linear regression analysis.

  • This type of regression analysis is one of the most basic types of regression and is used extensively in machine learning .
  • Linear regression has a predictor variable and a dependent variable which is related to each linearly.
  • Moreover, linear regression is used in cases where the relationship between the variables is related in a linear fashion.

Let’s say you are looking to measure the impact of email marketing on your sales. The linear analysis can be wrong as there will be aberrations. So, you should not use big data sets ( big data services ) for linear regression.

2. Logistic Regression Analysis

  • If your dependent variable has discrete values , that is, if they can have only one or two values, then logistic regression SPSS is the way to go.
  • The two values could be either 0 or 1, black or white, true or false, proceed or not proceed, and so on.
  • To show the relationship between the target and independent variables, logistic regression uses a sigmoid curve.

This type of regression is best used when there are large data sets that have a chance of equal occurrence of values in target variables. There should not be a huge correlation between the independent variables in the dataset.

3. Lasso Regression Analysis

  • Lasso regression is a regularization technique that reduces the model’s complexity.
  • How does it do that? By limiting the absolute size of the regression coefficient .
  • When doing so, the coefficient value becomes closer to zero. This does not happen with ridge regression.

Lass regression is advantageous as it uses feature selection – where it lets you select a set of features from the database to build your model. Since it uses only the required features, lasso regression manages to avoid overfitting.

4. Ridge Regression Analysis

  • If there is a high correlation between independent variables , ridge regression is the recommended tool.
  • It is also a regularization technique that reduces the complexity of the model .

Ridge regression manages to make the model less prone to overfitting by introducing a small amount of bias known as the ridge regression penalty, with the help of a bias matrix.

5. Polynomial Regression Analysis

  • Polynomial regression models a non-linear dataset with the help of a linear model .
  • Its working is similar to that of multiple linear regression. But it uses a non-linear curve and is mainly employed when data points are available in a non-linear fashion.
  • It transforms the data points into polynomial features of a given degree and manages to model them in the form of a linear model.

Polynomial regression involves fitting the data points using a polynomial line. Since this model is susceptible to overfitting, businesses are advised to analyze the curve during the end so that they get accurate results.

While there are many more regression analysis techniques, these are the most popular ones.

definition of regression analysis in research

How is regression analysis used by businesses?

Regression stats help businesses understand what their data points represent and how to use them with the help of business analytics techniques.

Using this regression model, you will understand how the typical value of the dependent variable changes based on how the other independent variables are held fixed.

Data professionals use this incredibly powerful statistical tool to remove unwanted variables and select the ones that are more important for the business.

Here are some uses of regression analysis:

1. Business Optimization

  • The whole objective of regression analysis is to make use of the collected data and turn it into actionable insights .
  • With the help of regression analysis, there won’t be any guesswork or hunches based on which decisions need to be made.
  • Data-driven decision-making improves the output that the organization provides.
  • Also, regression charts help organizations experiment with inputs that might not have been earlier thought of, but now that it is backed with data, the chances of success are also incredibly high.
  • When there is a lot of data available, the accuracy of the insights will also be high.

2. Predictive Analytics

  • For businesses that want to stay ahead of the competition, they need to be able to predict future trends. Organizations use regression analysis to understand what the future holds for them.
  • To forecast trends, the data analysts predict how the dependent variables change based on the specific values given to them.
  • You can use multivariate linear regression for tasks such as charting growth plans, forecasting sales volumes, predicting inventory required, and so on.
  • Find out more about the area so that you can gather data from different sources
  • Collect the data required for the relevant variables
  • Specify and measure your regression model
  • If you have a model which fits the data, then use it to come up with predictions

3. Decision-making

  • For businesses to run effectively, they need to make better decisions and be aware of how each of their decisions will affect them. If they do not understand the consequences of their decisions, it can be difficult for their smooth functioning.
  • Businesses need to collect information about each of their departments – sales, operations, marketing, finance, HR, expenditures, budgetary allocation, and so on. Using relevant parameters and analyzing them helps businesses improve their outcomes.
  • Regression analysis helps businesses understand their data and gain insights into their operations . Business analysts use regression analysis extensively to make strategic business decisions.

4. Understanding failures

  • One of the most important things that most businesses miss doing is not reflecting on their failures.
  • Without contemplating why they met with failure for a marketing campaign or why their churn rate increased in the last two years, they will never find ways to make it right.
  • Regression analysis provides quantitative support to enable this kind decision-making.

5. Predicting Success

  • You can use regression analysis to predict the probability of success of an organization in various aspects.
  • Additionally, regression in stats analyses the data point of various sales data, including current sales data, to understand and predict the success rate in the future.

6. Risk Analysis

  • When analyzing data, data analysts, sometimes, make the mistake of considering correlation and causation as the same. However, businesses should know that correlation is not causation.
  • Financial organizations use regression data to assess their risk and guide them to make sound business decisions.

7. Provides New Insights

  • Looking at a huge set of data will help you get new insights. But data, without analysis, is meaningless.
  • With the help of regression analysis, you can find the relationship between a variety of variables to uncover patterns.
  • For example, regression models might indicate that there are more returns from a particular seller. So the eCommerce company can get in touch with the seller to understand how they send their products.

Each of these issues has different solutions to them. Without regression analysis, it might have been difficult to understand exactly what was the issue in the first place.

8. Analyze marketing effectiveness

  • When the company wants to know if the funds they have invested in marketing campaigns for a particular brand will give them enough ROI, then regression analysis is the way to go.
  • It is possible to check the isolated impact of each of the campaigns by controlling the factors that will have an impact on the sales.
  • Businesses invest in a number of marketing channels – email marketing , paid ads, Instagram influencers, etc. Regression statistics is capable of capturing the isolated ROI as well as the combined ROI of each of these companies.

definition of regression analysis in research

7 Use Cases of Regression Analysis

1. credit card.

  • Credit card companies use regression analysis to understand various user factors such as the consumer’s future behavior, prediction of credit balance, risk of customer’s credit default, etc.
  • All of these data points help the company implement specific EMI options based on the results.
  • This will help credit card companies take note of the risky customers.
  • Simple linear regression (also called Ordinary Least Squares (OLS)) gives an overall rationale for the placing of the line of the best fit among the data points.
  • One of the most common applications using the statistical model is the Capital Asset Pricing Model (CAPM) which describes the relationship between the returns and risks of investing in a security.

3. Pharmaceuticals

  • Pharmaceutical companies use the process to analyze the quantitative stability data to estimate the shelf life of a product. This is because it finds the nature of the relationship between an attribute and time.
  • Medical researchers use regression analysis to understand if changes in drug dosage will have an impact on the blood pressure of patients. Pharma companies leveraging best engagement platforms of HCP to increase brand awareness in the virtual space.

For example, researchers will administer different dosages of a certain drug to patients and observe changes in their blood pressure. They will fit a simple regression model where they use dosage as the predictor variable and blood pressure as the response variable.

4. Text Editing

  • Logistic regression is a popular choice in a number of natural language processing (NLP) tasks s uch as text preprocessing.
  • After this, you can use logistic regression to make claims about the text fragment.
  • Email sorting, toxic speech detection, topic classification for questions, etc, are some of the areas where logistic regression shows great results.

5. Hospitality

  • You can use regression analysis to predict the intention of users and recognize them. For example, like where do the customers want to go? What they are planning to do?
  • It can even predict if the customer hasn’t typed anything in the search bar, based on how they started.
  • It is not possible to build such a huge and complex system from scratch. There are already several machine learning algorithms that have accumulated data and have simple models that make such predictions possible.

6. Professional sports

  • Data scientists working with professional sports teams use regression analysis to understand the effect that training regiments will have on the performance of players .
  • They will find out how different types of exercises, like weightlifting sessions or Zumba sessions, affect the number of points that player scores for their team (let’s say basketball).
  • Using Zumba and weightlifting as the predictor variables, and the total points scored as the response variable, they will fit the regression model.

Depending on the final values, the analysts will recommend that a player participates in more or less weightlifting or Zumba sessions to maximize their performance.

7. Agriculture

  • Agricultural scientists use regression analysis t o understand the effect of different fertilizers and how it affects the yield of the crops.
  • For example, the analysts might use different types of fertilizers and water on fields to understand if there is an impact on the crop’s yield.
  • Based on the final results, the agriculture analysts will change the number of fertilizers and water to maximize the crop output.

Wrapping Up

Using regression analysis helps you separate the effects that involve complicated research questions. It will allow you to make informed decisions, guide you with resource allocation, and increase your bottom line by a huge margin if you use the statistical method effectively.

If you are looking for an online survey tool to gather data for your regression analysis, SurveySparrow is one of the best choices. SurveySparrow has a host of features that lets you do as much as possible with a survey tool. Get on a call with us to understand how we can help you.

blog author image

Product Marketing Manager at SurveySparrow

Excels in empowering visionary companies through storytelling and strategic go-to-market planning. With extensive experience in product marketing and customer experience management, she is an accomplished author, podcast host, and mentor, sharing her expertise across diverse platforms and audiences.

You Might Also Like

How to Create a Chatbot with Little or No Programming!

How to Create a Chatbot with Little or No Programming!

Customer Satisfaction Questionnaires (With Free Template and 50+ Questions)

Customer Satisfaction Questionnaires (With Free Template and 50+ Questions)

How To Create A CX Strategy In 9 Simple Steps

How To Create A CX Strategy In 9 Simple Steps

What Is The Formula For Calculating Profit Margin Ratio?

What Is The Formula For Calculating Profit Margin Ratio?

Turn every feedback into a growth opportunity.

14-day free trial • Cancel Anytime • No Credit Card Required • Need a Demo?

Regression Analysis: Step by Step Articles, Videos, Simple Definitions

Probability and Statistics > Regression analysis

regression analysis

Regression analysis is a way to find trends in data. For example, you might guess that there’s a connection between how much you eat and how much you weigh; regression analysis can help you quantify that.

Regression analysis will provide you with an equation for a graph so that you can make predictions about your data. For example, if you’ve been putting on weight over the last few years, it can predict how much you’ll weigh in ten years time if you continue to put on weight at the same rate. It will also give you a slew of statistics (including a p-value and a correlation coefficient ) to tell you how accurate your model is. Most elementary stats courses cover very basic techniques, like making scatter plots and performing linear regression . However, you may come across more advanced techniques like multiple regression .

  • Introduction to Regression Analysis

Multiple Regression Analysis

  • Overfitting and how to avoid it
  • Related articles

Technology:

Regression in Minitab

Regression analysis: an introduction.

regression 1

Best of all, you can use the equation to make predictions. For example, how much snow will fall in 2017? y = 2.2923(2017) + 4624.4 = 0.8 inches.

Regression also gives you an R squared value, which for this graph is 0.702. This number tells you how good your model is. The values range from 0 to 1, with 0 being a terrible model and 1 being a perfect model. As you can probably see, 0.7 is a fairly decent model so you can be fairly confident in your weather prediction!

Back to Top

Multiple regression analysis is used to see if there is a statistically significant relationship between sets of variables . It’s used to find trends in those sets of data.

Multiple regression analysis is almost the same as simple linear regression . The only difference between simple linear regression and multiple regression is in the number of predictors (“x” variables) used in the regression.

  • Simple regression analysis uses a single x variable for each dependent “y” variable. For example: (x 1 , Y 1 ).
  • Multiple regression uses multiple “x” variables for each independent variable : (x1) 1 , (x2) 1 , (x3) 1 , Y 1 ).

In one-variable linear regression, you would input one dependent variable (i.e. “sales”) against an independent variable (i.e. “profit”). But you might be interested in how different types of sales effect the regression. You could set your X 1 as one type of sales, your X 2 as another type of sales and so on.

When to Use Multiple Regression Analysis.

Ordinary linear regression usually isn’t enough to take into account all of the real-life factors that have an effect on an outcome. For example, the following graph plots a single variable (number of doctors) against another variable (life-expectancy of women).

From this graph it might appear there is a relationship between life-expectancy of women and the number of doctors in the population . In fact, that’s probably true and you could say it’s a simple fix: put more doctors into the population to increase life expectancy. But the reality is you would have to look at other factors like the possibility that doctors in rural areas might have less education or experience. Or perhaps they have a lack of access to medical facilities like trauma centers.

The addition of those extra factors would cause you to add additional dependent variables to your regression analysis and create a multiple regression analysis model.

Multiple Regression Analysis Output.

Regression analysis is always performed in software, like Excel or SPSS. The output differs according to how many variables you have but it’s essentially the same type of output you would find in a simple linear regression. There’s just more of it:

  • Simple regression: Y = b 0 + b 1 x.
  • Multiple regression: Y = b 0 + b 1 x1 + b 0 + b 1 x2…b 0 …b 1 xn.

The output would include a summary, similar to a summary for simple linear regression, that includes:

  • R (the multiple correlation coefficient ),
  • R squared (the coefficient of determination ),
  • adjusted R-squared ,
  • The standard error of the estimate.

These statistics help you figure out how well a regression model fits the data. The ANOVA table in the output would give you the p-value and f-statistic .

Minimum Sample size

“The answer to the sample size question appears to depend in part on the objectives of the researcher, the research questions that are being addressed, and the type of model being utilized. Although there are several research articles and textbooks giving recommendations for minimum sample sizes for multiple regression, few agree on how large is large enough and not many address the prediction side of MLR .” ~ Gregory T. Knofczynski

If you’re concerned with finding accurate values for squared multiple correlation coefficient, minimizing the shrinkage of the squared multiple correlation coefficient or have another specific goal, Gregory Knofczynski’s paper is a worthwhile read and comes with lots of references for further study. That said, many people just want to run MLS to get a general idea of trends and they don’t need very specific estimates. If that’s the case, you can use a rule of thumb . It’s widely stated in the literature that you should have more than 100 items in your sample. While this is sometimes adequate, you’ll be on the safer side if you have at least 200 observations or better yet—more than 400.

Overfitting in Regression

overfitting

Overfitting is where your model is too complex for your data — it happens when your sample size is too small. If you put enough predictor variables in your regression model, you will nearly always get a model that looks significant .

While an overfitted model may fit the idiosyncrasies of your data extremely well, it won’t fit additional test samples or the overall population. The model’s p-values, R-Squared and regression coefficients can all be misleading. Basically, you’re asking too much from a small set of data.

How to Avoid Overfitting

In linear modeling (including multiple regression ), you should have at least 10-15 observations for each term you are trying to estimate. Any less than that, and you run the risk of overfitting your model. “Terms” include:

  • Interaction Effects,
  • Polynomial expression s (for modeling curved lines),
  • Predictor variables.

While this rule of thumb is generally accepted, Green (1991) takes this a step further and suggests that the minimum sample size for any regression should be 50, with an additional 8 observations per term. For example, if you have one interacting variable and three predictor variables, you’ll need around 45-60 items in your sample to avoid overfitting, or 50 + 3(8) = 74 items according to Green.

There are exceptions to the “10-15” rule of thumb. They include:

  • When there is multicollinearity in your data, or if the effect size is small. If that’s the case, you’ll need to include more terms (although there is, unfortunately, no rule of thumb for how many terms to add!).
  • You may be able to get away with as few as 10 observations per predictor if you are using logistic regression or survival models , as long as you don’t have extreme event probabilities, small effect sizes, or predictor variables with truncated ranges . (Peduzzi et al.)

How to Detect and Avoid Overfitting

The easiest way to avoid overfitting is to increase your sample size by collecting more data. If you can’t do that, the second option is to reduce the number of predictors in your model — either by combining or eliminating them. Factor Analysis is one method you can use to identify related predictors that might be candidates for combining.

1. Cross-Validation

Use cross validation to detect overfitting: this partitions your data, generalizes your model, and chooses the model which works best. One form of cross-validation is predicted R-squared . Most good statistical software will include this statistic, which is calculated by:

  • Removing one observation at a time from your data,
  • Estimating the regression equation for each iteration,
  • Using the regression equation to predict the removed observation.

Cross validation isn’t a magic cure for small data sets though, and sometimes a clear model isn’t identified even with an adequate sample size.

2. Shrinkage & Resampling

Shrinkage and resampling techniques (like this R-module ) can help you to find out how well your model might fit a new sample.

3. Automated Methods

Automated stepwise regression shouldn’t be used as an overfitting solution for small data sets. According to Babyak (2004),

“The problems with automated selection conducted in this very typical manner are so numerous that it would be hard to catalogue all of them [in a journal article].”

Babyak also recommends avoiding univariate pretesting or screening (a “variation of automated selection in disguise”), dichotomizing continuous variables — which can dramatically increase Type I errors , or multiple testing of confounding variables (although this may be ok if used judiciously).

Books: Gonick, L. (1993). The Cartoon Guide to Statistics . HarperPerennial. Lindstrom, D. (2010). Schaum’s Easy Outline of Statistics , Second Edition (Schaum’s Easy Outlines) 2nd Edition. McGraw-Hill Education Journal articles:

  • Babyak, M.A.,(2004). “What you see may not be what you get: a brief, nontechnical introduction to overfitting in regression-type models.” Psychosomatic Medicine. 2004 May-Jun;66(3):411-21.
  • Green S.B., (1991) “How many subjects does it take to do a regression analysis?” Multivariate Behavior Research 26:499–510.
  • Peduzzi P.N., et. al (1995). “The importance of events per independent variable in multivariable analysis, II: accuracy and precision of regression estimates.” Journal of Clinical Epidemiology 48:1503–10.
  • Peduzzi P.N., et. al (1996). “A simulation study of the number of events per variable in logistic regression analysis.” Journal of Clinical Epidemiology 49:1373–9.

Check out our YouTube channel for hundreds of videos on elementary statistics, including regression analysis using a variety of tools like Excel and the TI-83.

More articles

  • Additive Model & Multiplicative Model
  • How to Construct a Scatter Plot.
  • How to Calculate Pearson’s Correlation Coefficients.
  • How to Compute a Linear Regression Test Value.
  • Chow Test for Split Data Sets
  • Forward Selection
  • What is Kriging?
  • How to Find a Linear Regression Equation.
  • How to Find a Regression Slope Intercept.
  • How to Find a Linear Regression Slope.
  • Sinusoidal Regression: Definition, Desmos Example, TI-83
  • How to Find the Standard Error of Regression Slope.
  • Mallows’ Cp
  • Validity Coefficient: What it is and how to find it.
  • Quadratic Regression.
  • Quantile Regression In Analysis
  • Quartic Regression
  • Stepwise Regression
  • Unstandardized Coefficient
  • Next: : Weak Instruments

Fun fact: Did you know regression isn’t just for creating trendlines . It’s also a great hack for finding the nth term of a quadratic sequence .

Definitions

  • Assumptions and Conditions for Regression.
  • Betas / Standardized Coefficients.
  • What is a Beta Weight?
  • Bivariate correlation and regression.
  • Bilinear Regression
  • The Breusch-Pagan-Godfrey Test
  • Cook’s Distance.
  • What is a Covariate?
  • Cox Regression .
  • Detrend Data.
  • Exogeneity .
  • Gauss-Newton Algorithm.
  • What is the General Linear Model?
  • What is the Generalized Linear Model?
  • What is the Hausman Test?
  • What is Homoscedasticity?
  • Influential Data.
  • What is an Instrumental Variable?
  • Lack of Fit
  • Lasso Regression.
  • Levenberg–Marquardt Algorithm
  • What is the Line of best fit?
  • What is Logistic Regression?
  • What is the Mahalanobis distance?
  • Model Misspecification.
  • Multinomial Logistic Regression .
  • What is Nonlinear Regression?
  • Ordered Logit / Ordered Logistic Regression
  • What is Ordinary Least Squares Regression?
  • Overfitting .
  • Parsimonious Models .
  • What is Pearson’s Correlation Coefficient?
  • Poisson Regression .
  • Probit Model .
  • What is a Prediction Interval?
  • What is Regularization?
  • Regularized Least Squares .
  • Regularized Regression
  • What are Relative Weights?
  • What are Residual Plots?
  • Reverse Causality .
  • Ridge Regression
  • Root Mean Square Error.
  • Semiparametric models
  • Simultaneity Bias.
  • Simultaneous Equations Model .
  • What is Spurious Correlation?
  • Structural Equations Model
  • What are Tolerance Intervals?
  • Trend Analysis
  • Tuning Parameter
  • What is Weighted Least Squares Regression?
  • Y Hat explained .

Regression is fitting data to a line ( Minitab can also perform other types of regression, like quadratic regression ). When you find regression in Minitab, you’ll get a scatter plot of your data along with the line of best fit , plus Minitab will provide you with:

  • Standard Error (how much the data points deviate from the mean ).
  • R squared : a value between 0 and 1 which tells you how well your data points fit the model.
  • Adjusted R 2 (adjusts R 2 to account for data points that do not fit the model).

Regression in Minitab takes just a couple of clicks from the toolbar and is accessed through the Stat menu.

Example question : Find regression in Minitab for the following set of data points that compare calories consumed per day to weight: Calories consumed daily (Weight in lb): 2800 (140), 2810 (143), 2805 (144), 2705 (145), 3000 (155), 2500 (130), 2400 (121), 2100 (100), 2000 (99), 2350 (120), 2400 (121), 3000 (155).

Step 1: Type your data into two columns in Minitab .

Step 2: Click “Stat,” then click “Regression” and then click “Fitted Line Plot.”

Minitab regression

Step 3: Click a variable name for the dependent value in the left-hand window. For this sample question, we want to know if calories consumed affects weight , so calories is the independent variable (Y) and weight is the dependent variable (X). Click “Calories” and then click “Select.”

Step 4: Repeat Step 3 for the dependent X variable , weight.

regression in Minitab

Step 5: Click “OK.” Minitab will create a regression line graph in a separate window.

Step 4: Read the results. As well as creating a regression graph, Minitab will give you values for S, R-sq and R-sq(adj) in the top right corner of the fitted line plot window. s = standard error . R-Sq = Coefficient of Determination R-Sq(adj) = Adjusted Coefficient of Determination ( Adjusted R Squared ).

That’s it!

  • Business Essentials
  • Leadership & Management
  • Credential of Leadership, Impact, and Management in Business (CLIMB)
  • Entrepreneurship & Innovation
  • Digital Transformation
  • Finance & Accounting
  • Business in Society
  • For Organizations
  • Support Portal
  • Media Coverage
  • Founding Donors
  • Leadership Team

definition of regression analysis in research

  • Harvard Business School →
  • HBS Online →
  • Business Insights →

Business Insights

Harvard Business School Online's Business Insights Blog provides the career insights you need to achieve your goals and gain confidence in your business skills.

  • Career Development
  • Communication
  • Decision-Making
  • Earning Your MBA
  • Negotiation
  • News & Events
  • Productivity
  • Staff Spotlight
  • Student Profiles
  • Work-Life Balance
  • AI Essentials for Business
  • Alternative Investments
  • Business Analytics
  • Business Strategy
  • Business and Climate Change
  • Creating Brand Value
  • Design Thinking and Innovation
  • Digital Marketing Strategy
  • Disruptive Strategy
  • Economics for Managers
  • Entrepreneurship Essentials
  • Financial Accounting
  • Global Business
  • Launching Tech Ventures
  • Leadership Principles
  • Leadership, Ethics, and Corporate Accountability
  • Leading Change and Organizational Renewal
  • Leading with Finance
  • Management Essentials
  • Negotiation Mastery
  • Organizational Leadership
  • Power and Influence for Positive Impact
  • Strategy Execution
  • Sustainable Business Strategy
  • Sustainable Investing
  • Winning with Digital Platforms

What Is Regression Analysis in Business Analytics?

Business professional using calculator for regression analysis

  • 14 Dec 2021

Countless factors impact every facet of business. How can you consider those factors and know their true impact?

Imagine you seek to understand the factors that influence people’s decision to buy your company’s product. They range from customers’ physical locations to satisfaction levels among sales representatives to your competitors' Black Friday sales.

Understanding the relationships between each factor and product sales can enable you to pinpoint areas for improvement, helping you drive more sales.

To learn how each factor influences sales, you need to use a statistical analysis method called regression analysis .

If you aren’t a business or data analyst, you may not run regressions yourself, but knowing how analysis works can provide important insight into which factors impact product sales and, thus, which are worth improving.

Access your free e-book today.

Foundational Concepts for Regression Analysis

Before diving into regression analysis, you need to build foundational knowledge of statistical concepts and relationships.

Independent and Dependent Variables

Start with the basics. What relationship are you aiming to explore? Try formatting your answer like this: “I want to understand the impact of [the independent variable] on [the dependent variable].”

The independent variable is the factor that could impact the dependent variable . For example, “I want to understand the impact of employee satisfaction on product sales.”

In this case, employee satisfaction is the independent variable, and product sales is the dependent variable. Identifying the dependent and independent variables is the first step toward regression analysis.

Correlation vs. Causation

One of the cardinal rules of statistically exploring relationships is to never assume correlation implies causation. In other words, just because two variables move in the same direction doesn’t mean one caused the other to occur.

If two or more variables are correlated , their directional movements are related. If two variables are positively correlated , it means that as one goes up or down, so does the other. Alternatively, if two variables are negatively correlated , one goes up while the other goes down.

A correlation’s strength can be quantified by calculating the correlation coefficient , sometimes represented by r . The correlation coefficient falls between negative one and positive one.

r = -1 indicates a perfect negative correlation.

r = 1 indicates a perfect positive correlation.

r = 0 indicates no correlation.

Causation means that one variable caused the other to occur. Proving a causal relationship between variables requires a true experiment with a control group (which doesn’t receive the independent variable) and an experimental group (which receives the independent variable).

While regression analysis provides insights into relationships between variables, it doesn’t prove causation. It can be tempting to assume that one variable caused the other—especially if you want it to be true—which is why you need to keep this in mind any time you run regressions or analyze relationships between variables.

With the basics under your belt, here’s a deeper explanation of regression analysis so you can leverage it to drive strategic planning and decision-making.

Related: How to Learn Business Analytics without a Business Background

What Is Regression Analysis?

Regression analysis is the statistical method used to determine the structure of a relationship between two variables (single linear regression) or three or more variables (multiple regression).

According to the Harvard Business School Online course Business Analytics , regression is used for two primary purposes:

  • To study the magnitude and structure of the relationship between variables
  • To forecast a variable based on its relationship with another variable

Both of these insights can inform strategic business decisions.

“Regression allows us to gain insights into the structure of that relationship and provides measures of how well the data fit that relationship,” says HBS Professor Jan Hammond, who teaches Business Analytics, one of three courses that comprise the Credential of Readiness (CORe) program . “Such insights can prove extremely valuable for analyzing historical trends and developing forecasts.”

One way to think of regression is by visualizing a scatter plot of your data with the independent variable on the X-axis and the dependent variable on the Y-axis. The regression line is the line that best fits the scatter plot data. The regression equation represents the line’s slope and the relationship between the two variables, along with an estimation of error.

Physically creating this scatter plot can be a natural starting point for parsing out the relationships between variables.

Credential of Readiness | Master the fundamentals of business | Learn More

Types of Regression Analysis

There are two types of regression analysis: single variable linear regression and multiple regression.

Single variable linear regression is used to determine the relationship between two variables: the independent and dependent. The equation for a single variable linear regression looks like this:

Single Variable Linear Regression Formula

In the equation:

  • ŷ is the expected value of Y (the dependent variable) for a given value of X (the independent variable).
  • x is the independent variable.
  • α is the Y-intercept, the point at which the regression line intersects with the vertical axis.
  • β is the slope of the regression line, or the average change in the dependent variable as the independent variable increases by one.
  • ε is the error term, equal to Y – ŷ, or the difference between the actual value of the dependent variable and its expected value.

Multiple regression , on the other hand, is used to determine the relationship between three or more variables: the dependent variable and at least two independent variables. The multiple regression equation looks complex but is similar to the single variable linear regression equation:

Multiple Regression Formula

Each component of this equation represents the same thing as in the previous equation, with the addition of the subscript k, which is the total number of independent variables being examined. For each independent variable you include in the regression, multiply the slope of the regression line by the value of the independent variable, and add it to the rest of the equation.

How to Run Regressions

You can use a host of statistical programs—such as Microsoft Excel, SPSS, and STATA—to run both single variable linear and multiple regressions. If you’re interested in hands-on practice with this skill, Business Analytics teaches learners how to create scatter plots and run regressions in Microsoft Excel, as well as make sense of the output and use it to drive business decisions.

Calculating Confidence and Accounting for Error

It’s important to note: This overview of regression analysis is introductory and doesn’t delve into calculations of confidence level, significance, variance, and error. When working in a statistical program, these calculations may be provided or require that you implement a function. When conducting regression analysis, these metrics are important for gauging how significant your results are and how much importance to place on them.

Business Analytics | Become a data-driven leader | Learn More

Why Use Regression Analysis?

Once you’ve generated a regression equation for a set of variables, you effectively have a roadmap for the relationship between your independent and dependent variables. If you input a specific X value into the equation, you can see the expected Y value.

This can be critical for predicting the outcome of potential changes, allowing you to ask, “What would happen if this factor changed by a specific amount?”

Returning to the earlier example, running a regression analysis could allow you to find the equation representing the relationship between employee satisfaction and product sales. You could input a higher level of employee satisfaction and see how sales might change accordingly. This information could lead to improved working conditions for employees, backed by data that shows the tie between high employee satisfaction and sales.

Whether predicting future outcomes, determining areas for improvement, or identifying relationships between seemingly unconnected variables, understanding regression analysis can enable you to craft data-driven strategies and determine the best course of action with all factors in mind.

Do you want to become a data-driven professional? Explore our eight-week Business Analytics course and our three-course Credential of Readiness (CORe) program to deepen your analytical skills and apply them to real-world business problems.

definition of regression analysis in research

About the Author

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • Write for Us
  • BMJ Journals

You are here

  • Volume 24, Issue 4
  • Understanding and interpreting regression analysis
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0002-7839-8130 Parveen Ali 1 , 2 ,
  • http://orcid.org/0000-0003-0157-5319 Ahtisham Younas 3 , 4
  • 1 School of Nursing and Midwifery , University of Sheffield , Sheffield , South Yorkshire , UK
  • 2 Sheffiled University Interpersonal Violence Research Group , The University of Sheffiled SEAS , Sheffield , UK
  • 3 Faculty of Nursing , Memorial University of Newfoundland , St. John's , Newfoundland and Labrador , Canada
  • 4 Swat College of Nursing , Mingora, Swat , Pakistan
  • Correspondence to Ahtisham Younas, Memorial University of Newfoundland, St. John's, NL A1C 5S7, Canada; ay6133{at}mun.ca

https://doi.org/10.1136/ebnurs-2021-103425

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

  • statistics & research methods

Introduction

A nurse educator is interested in finding out the academic and non-academic predictors of success in nursing students. Given the complexity of educational and clinical learning environments, demographic, clinical and academic factors (age, gender, previous educational training, personal stressors, learning demands, motivation, assignment workload, etc) influencing nursing students’ success, she was able to list various potential factors contributing towards success relatively easily. Nevertheless, not all of the identified factors will be plausible predictors of increased success. Therefore, she could use a powerful statistical procedure called regression analysis to identify whether the likelihood of increased success is influenced by factors such as age, stressors, learning demands, motivation and education.

What is regression?

Purposes of regression analysis.

Regression analysis has four primary purposes: description, estimation, prediction and control. 1 , 2 By description, regression can explain the relationship between dependent and independent variables. Estimation means that by using the observed values of independent variables, the value of dependent variable can be estimated. 2 Regression analysis can be useful for predicting the outcomes and changes in dependent variables based on the relationships of dependent and independent variables. Finally, regression enables in controlling the effect of one or more independent variables while investigating the relationship of one independent variable with the dependent variable. 1

Types of regression analyses

There are commonly three types of regression analyses, namely, linear, logistic and multiple regression. The differences among these types are outlined in table 1 in terms of their purpose, nature of dependent and independent variables, underlying assumptions, and nature of curve. 1 , 3 However, more detailed discussion for linear regression is presented as follows.

  • View inline

Comparison of linear, logistic and multiple regression

Linear regression and interpretation

Linear regression analysis involves examining the relationship between one independent and dependent variable. Statistically, the relationship between one independent variable (x) and a dependent variable (y) is expressed as: y= β 0 + β 1 x+ε. In this equation, β 0 is the y intercept and refers to the estimated value of y when x is equal to 0. The coefficient β 1 is the regression coefficient and denotes that the estimated increase in the dependent variable for every unit increase in the independent variable. The symbol ε is a random error component and signifies imprecision of regression indicating that, in actual practice, the independent variables are cannot perfectly predict the change in any dependent variable. 1 Multiple linear regression follows the same logic as univariate linear regression except (a) multiple regression, there are more than one independent variable and (b) there should be non-collinearity among the independent variables.

Factors affecting regression

Linear and multiple regression analyses are affected by factors, namely, sample size, missing data and the nature of sample. 2

Small sample size may only demonstrate connections among variables with strong relationship. Therefore, sample size must be chosen based on the number of independent variables and expect strength of relationship.

Many missing values in the data set may affect the sample size. Therefore, all the missing values should be adequately dealt with before conducting regression analyses.

The subsamples within the larger sample may mask the actual effect of independent and dependent variables. Therefore, if subsamples are predefined, a regression within the sample could be used to detect true relationships. Otherwise, the analysis should be undertaken on the whole sample.

Building on her research interest mentioned in the beginning, let us consider a study by Ali and Naylor. 4 They were interested in identifying the academic and non-academic factors which predict the academic success of nursing diploma students. This purpose is consistent with one of the above-mentioned purposes of regression analysis (ie, prediction). Ali and Naylor’s chosen academic independent variables were preadmission qualification, previous academic performance and school type and the non-academic variables were age, gender, marital status and time gap. To achieve their purpose, they collected data from 628 nursing students between the age range of 15–34 years. They used both linear and multiple regression analyses to identify the predictors of student success. For analysis, they examined the relationship of academic and non-academic variables across different years of study and noted that academic factors accounted for 36.6%, 44.3% and 50.4% variability in academic success of students in year 1, year 2 and year 3, respectively. 4

Ali and Naylor presented the relationship among these variables using scatter plots, which are commonly used graphs for data display in regression analysis—see examples of various scatter plots in figure 1 . 4 In a scatter plot, the clustering of the dots denoted the strength of relationship, whereas the direction indicates the nature of relationships among variables as positive (ie, increase in one variable results in an increase in the other) and negative (ie, increase in one variable results in decrease in the other).

  • Download figure
  • Open in new tab
  • Download powerpoint

An Example of Scatter Plot for Regression.

Table 2 presents the results of regression analysis for academic and non-academic variables for year 4 students’ success. The significant predictors of student success are denoted with a significant p value. For every, significant predictor, the beta value indicates the percentage increase in students’ academic success with one unit increase in the variable.

Regression model for the final year students (N=343)

Conclusions

Regression analysis is a powerful and useful statistical procedure with many implications for nursing research. It enables researchers to describe, predict and estimate the relationships and draw plausible conclusions about the interrelated variables in relation to any studied phenomena. Regression also allows for controlling one or more variables when researchers are interested in examining the relationship among specific variables. Some of the key considerations are presented that may be useful for researchers undertaking regression analysis. While planning and conducting regression analysis, researchers should consider the type and number of dependent and independent variables as well as the nature and size of sample. Choosing a wrong type of regression analysis with small sample may result in erroneous conclusions about the studied phenomenon.

Ethics statements

Patient consent for publication.

Not required.

  • Montgomery DC ,
  • Schneider A ,

Twitter @parveenazamali, @@Ahtisham04

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests None declared.

Provenance and peer review Commissioned; internally peer reviewed.

Read the full text or download the PDF:

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 03 September 2024

Resilience in tourism-based SMEs driven by initiatives and strategies through share value relational capital viewed from a resource-based theory perspective

  • Suherman 1 ,
  • Florentinus Pambudi Widiatmaka 1 ,
  • Fitri Kensiwi 1 ,
  • Didik Dwi Suharso 1 ,
  • Sukirno 1 ,
  • Pranyoto 1 ,
  • Susena Karona Cahya 2 ,
  • Kundori   ORCID: orcid.org/0000-0001-7447-3562 3 ,
  • Haniek Listyorini 4 ,
  • Sapto Supriyanto 4 ,
  • Pranoto 4 &
  • Sukrisno   ORCID: orcid.org/0000-0001-7527-4303 4  

Humanities and Social Sciences Communications volume  11 , Article number:  1128 ( 2024 ) Cite this article

Metrics details

  • Business and management

This study aims to build a conceptual model based on the role of share value relational capital, frugal innovation and ambidexterity as an effort to link the increase in the resilience of SMEs (small medium enterprises). By using a quantitative research design, a structural study methodology was adopted in this study. Overall, data from 132 tourism SMEs was collected using a questionnaire and 568 data from respondents that can be used for further analysis. The analysis uses logistic regression with dichotomous response and predictor variables on structured tables of count data, representing firm performance as a result of capital resources, physical resources and possibly innovation. To test share value relational capital, frugal innovation and ambidextirity which is outlined in the research model and the hypotheses, the authors used the structural modelling software Analysis of Moment Structure, Structural of Equation Model (AMOS SEM) to analyse 568 usable questionnaires. The results show the acceptance of all hypotheses proposed in this study, as well as the role of share value relational capital, frugal innovation and ambidextirity variables can be a bridge to leverage the resilience of SMEs. The findings in this research area show that there are three pathway strategies for increasing resilience SMEs. Share value relational capital, frugal innovation and ambidexterity are the basis of separate contributions to increase the resilience of SMEs There are several practical implications for SMEs managers who want to increase their resilience. The originality in this research is the role of share value relational capital, frugal innovation and ambidexterity in being able to bridge the influence between digital business transformation and resilience SMEs.

Introduction

An important challenge faced by SMEs in emerging markets is their ability to maintain resilience (Craighead et al. 2020 ). This is a limitation that SMEs have to be able to provide a challenge in surviving and facing successive global crises (Dian et al. 2022 ). A new wave of research is needed to study. Various factors that are essential enough as determinants that can help companies. They can build resilience that has a higher resilience value against future crises. They can recover quickly. There is no specific consensus in the literature on the factors that contribute to organizational resilience. However, the explosion of digital technology adoption has placed the field of digital transformation at the forefront of the resilience organization discussion. Digital solutions have provided a form of things. It opens up more new opportunities for companies, especially those in developing countries. This is necessary to provide a form of offering their product solutions, create more competition and innovation and also reach a higher international level (AL-Khatib 2023 ; Bellaaj 2021 ).

The presence of a digital transformation has become an important need for organizations in understanding various collaborations, coordination, business operations and generating various new knowledge. Changes in identity, manifested by unusual behaviour or choices, indicate changes in core cultural values. In line with researchers and theorists (e.g. Berry 1983 ; Nahavandi and Malekzadeh 1988 ; Gupta and Govindarajan 2002 ; Dweck 2006 ; Levy et al. 2007 ; Ormrod 2012 ) who agree that the mindset [of core cultural values] can be changed and improved, we are interested in and seek to explain how such a process of change takes place. Living in different countries or working in an international company is likely to provide greater opportunities to learn and unlearn cultural values. This expectation suggests that, if the above mechanism works well, mobile people and intercultural organisations might have an increased chance of e.g. achieving innovative outcomes.

This can be considered as an innovation. It becomes a determining factor that has an important contribution in facing various threats and unexpected challenges. It occured when colonizing a new form of business opportunity. It will definitely be considered as a dynamic capability that is quite important because with this capability. The company will be able to survive and handle the risks and challenges in reacting to environmental developments. The company can change quite progressively and unexpectedly (Messabia et al. 2022 ). One explanation for the resource curse (Metcalfe and Ramlogan 2006 ; Murshed 2004 ; Ross 1999 ) might be that firms appear to be using their abundance of capital and physical assets to set up businesses outside their core areas of expertise, at the expense of innovation. Ignorance of know-how and “creative capital” could lead them to miss opportunities to improve productivity or to introduce positive changes in their business operations and management. Various prescriptions related to innovation have usualy assumed that the prosperity of resources and abundance of the organization may not match the characteristics of the SMES (Tian et al. 2021 ). In such a situation, in producing a form of innovation management, the results obtained in terms of value are less than the resources that have been spent (Shibin et al. 2018 ). Frugal innovation is an appropriate and simplee solution in generating various challenges that will be faced by SMEs. The problem atisation of the role of innovation in economic growth has been demonstrated by previous research in the case of firms in Vietnam that are entering a transition period where reliance on capital resources and physical assets is expected to determine the development of the firm. There is an assumption that poorly performing firms will show evidence of over-reliance on capital and physical assets (Vuong and Napier 2014 ). A concept called Mindsponge has been proposed as a mechanism to explain how individuals will be able to absorb new cultural values and ideologies and integrate with various multidimensional environments and adapt using the integration of new values that are closest to them (Vuong and Napier 2015 ). Individuals who have the ability to enhance their ability to observe and adapt and integrate may have the opportunity to thrive as they may be more objectively open to the global environment and also open to innovative ideas (Napier and Nilsson 2008 ).

Frugal innovation is expected to be able to provide a paradigm related to examples of mindsets and actions that are quite unique in entrepreneurial orientation with capacity building, building independence in achieving sustainability (Shibin et al. 2018 ). There is an integrative and holistic relationship between frugal innovation and resilient SMEs. This relationship can be explained related to the development of products or services produced to achieve efficiency and precisely according to the needs of existing customers. This will encourage SMEs to adapt to environmental changes and build dynamic capabilities. Frugal Innovation also has an essential role in helping SMEs to reduce costs and accelerate the development process which will encourage SMEs to survive and adapt and their ability to deal with environmental changes (Rehman et al. 2019 ), (Qu et al. 2023 ). In the viewpoint of dynamic capabilities, it has been advocated that organizational learning. It is a form of instruction in developing and building organizational capabilities dynamically in maintaining and improving their performance (Martin et al. 2021 ). In the concept of digital transformation and innovation will produce a resultante as a result of interaction with the environment. This response is adjusted to various challenges and needs and problems that arise (Widiatmaka et al. 2023 ). The long-term success of operations and corporate strategies will depend primarily on organizational learning that results in effectiveness. This is considered as a bridge between the company and its environment.

There is a proposition that will provide a form of cognitive thinking horizontally. It can expand the various impacts of organizational learning and its strategic role in the organization interacting with digital transformation. Frugal innovation and resilience SMEs these special capabilities can be realized in various abilities to produce innovation (Bogatyreva et al. 2022 ; Lee and Kreiser 2018 ). In addition, these specific capabilities can also produce a relationship with the environment (Karami and Tang 2019 ). A specific capability will grow and eventually need to be combined and compromised this will interact with each other due to various issues related to the allocation of resources (T. Zhang et al. 2022 ). Various concepts will promote the idea and existence of an ambidexterity concept. The concept is an attempt to manage the capabilities of SMEs that will place the SMEs in the right context. In this assumption the capability of an innovation will be identified from the various capabilities possessed by the company in generating exploitation and exploration (Dunlap et al. 2016 ; G. Zhang et al. 2018 ). Meanwhile, social news will provide a reflection in providing a business social network defense and strength for the sustainability of SMEs (Karami and Tang 2019 ; Peng and Luo 2000 ). An ability is called ambidexterous when it is able to provide an appropriate form of resource synergy and it can be a reflection as the concept of integration activities (Lumpkin and Dess 1996 ). This integration activity will be able to become a proxy through a predetermined level of sterility. However, specifically the viewpoint of the company in achieving ambidexterity conditions will be divided into two approaches, namely the combined dimension and the balanced dimension (Cao et al. 2009 ; Fu et al. 2020 ). A company will achieve a form of ambidexterity when the company is able to produce a resultant of these two things. Ambidexterity is the power that allows all medium enterprises to be able to develop and implement various strategies related to efficiency and innovation that are able to change the environmental context (Iborra et al. 2020 ). Several related researchers have contributed empirically that ambidexterity is a dynamic capability that requires various development of capture, transformation and sensing activities (Luger et al. 2018 ; Teece and Teece 2007 ; Vahlne and Jonsson 2016 ). In the context of medium enterprises, ambideterity will be able to provide an essential role in integrating various objectives from both the exploitation and exploration sides that are contradictory (Lubatkin et al. 2006 ). In line with these arguments, this study proposes what is dynamic enough to influence the resilience of SMEs. SMEs have the ability to manage both exploitation and exploration which plays a role in improving efficiency without losing the ability to develop new product and process ideas.

Company have a form of difference but are able to produce a form of coherence simultaneously. However, the inherent challenge is that the company must be able to provide a resource allocation that must be considered by SMEs resources. It tend to be the same related to the two activities in terms of combined dimensions and balanced dimensions as an example is exploration and exploitation. This use of internal resources that are contested with the search for external resources. However, the opposite of the adjacent dimension of the combined dimension will provide a possibility for the company to carry out various activities. They are divided into two different activities and can be carried out in stages. It may be possible to start with long-term exploitation activities and then over a period of time continue with exploration activities. When companies will look at the application of these two approaches. It is considered very important by SMEs. That companies must be able to have a special characteristic both in terms of limited resources and the industry sector in which they are located and compete (Cao et al. 2009 ). Although there has been a lot of research aimed at studying the concept of digital solutions. The literature that examines the relationship between digital business transformation and the resilience of SMEs in developing countries is still very limited. So this raises a form of academic criticism of the shortcomings of various literatures in producing quantitative and empirical experiment research. The contribution of digital business transport that is expected to contribute to SMES resilience (Musril et al. 2023 ). This confirms the existence of a continuous need to do more to produce a variety of empirical research. The role of innovation, ambidexterity and digital transformation in improving the resilience of SMEs. To answer these questions this study uses the resource based theory to produce a synthesis of a new concept called share value relational capital. It will mediate in solving the gap that exists between digital business transformation and resilience SMEs. In addition, departing from this theory of technical social systems which is a derivation of the resource based theory will present the variable digital business transformation. This theory also derived the concept of ambidexterity which will be divided into social media ambidexterity and innovation ambidexterity. These concepts will be built a new conceptual model as an effort to increase there silience of SMEs. Studying SMEs in Indonesia is very important as they can be the growth engine for Indonesia to prosper as they are able to generate about 57% to 60% share of GDP (Tambunan 2019 ). Second, the SMES sector is a significant contributor to GDP growth, however, the presence of SMEs in Indonesia is highly vulnerable to sustainability (Hernita et al. 2021 ).

When there are barriers such as the COVID-19 pandemic to innovation, the capital structure to get credit and how to market credit in banks (Tambunan 2019 ). A review of previous research confirms that empirical research linking resilient organizations to digital transformation is still rather limited. Despite the growing interest in adopting information technology and digital solutions, it is quite common in science and practice. Combined with the combined results of the share value relational capital concept, the study of how digital business transformation, innovation, and ambidexterity concepts can be used to improve SMES resilience has not been explored. Even previous research showing that frugal innovation promotes the notion of SMES resilience remains undiscussed. This study attempts to fill the existing theoretical gap by providing concepts that are expected to drive resilience in SMEs. How can SMEs initiate and design strategies to increase resilience? What factors influence efforts to build resilience? This study aims to answer these questions using the mediating role of the variable share value relational capital. Based on the formulation of the problem, this study uses a quantitative research design to provide exploration and analyse the causal relationship between the variables of digital basis transformation, frugal innovation, ambidexterity, share value relational capital and resilience SMEs and this model can see in Fig. 1 . The research model created in this study will be tested on the SME industry in the province of Central Java and Yogyakarta as a representation of SMEs in Indonesia.

figure 1

Author Own Research 2023.

Literature review and hypothesis development

Organizational resilience can be considered as a dynamic capability. It has been generated through various sets of capabilities possessed by the organization. It able to provide an absorption of change unexpectedly and in disruptive events (Mousa and Othman 2020 ). This will be associated with various concepts such as the ability possessed by the organization in developing various adaptive capacities. This adaptive capacity aims to provide a response and be able to recover quickly. This capacity will depart from any unexpectedness that allows the emergence of business continuity to survive despite various threats (Jia et al. 2020 ). In view of the flexibility and agility possessed by the organization. It is hoped that the organization will be able to provide a reflection regarding the ability to cope with various changes. In addition, organizations are also expected to have a variety of resilience. It will be an important role in responding to various crises and unexpected threats (Duchek 2020 ). In addition, resilience can consist of various aspects which include adaptation. It will allow the company to overcome the crisis stronger than before. The adaptation will different and has robustness which is a reflection of the company’s ability to maintain its function during destructive events (Madni and Jackson 2009 ). Continuously formed adaptation has increased the needs of companies. That are radically expected to revise and change various ways of doing business and provide a form of rethinking their strategy. This aims to provide a form of resilience where digital solutions currently have an essential role. These solutions allow the emergence of digital technology and are considered as a strategic solution. That is described and will be ableto overcome various environments that have been radically turbulent to improve performance (Savastano et al. 2022 ).

Various recent studies have confirmed that the slower the progress in adopting digital technology developments, the greater distance, between the company and changes in its business environment (Corsini et al. 2023 ; Urbach et al. 2019 ). Digital business transformation can be considered as a digitalization. It can be considered as an organizational change in creating new value driven by internet services and digital technology. It serves to change the way of managing and running the business owned by the company (Urbach et al. 2019 ). This digital business transformation will represent an initiative owned by the company. Its role for improving capabilities through the use of various digital technologies to change the operations and strategies of the company (Al-Smadi 2023 ; Savastano et al. 2022 ). However, although much of the literature has confirmed that digital transformation is considered a significant challenge for SMEs in emerging markets. It has raised a new form of research issue in scientific communication in the field of Management Science. One such issue is the emergence of knowledge in the previous literature. It regards the role of digital business transformation in its contribution to sustaining and enhancing resilience SMEs. This needs to be done quickly and in a very limited time as a response to destructive changes. In line with the emergence of various waves of digital transformation has been launched on a global scale. It allows the emergence of various new innovations. Previous literature has considered that the innovation of a form of business capability will succeed or fail compared to other companies when responding to an uncertain environment (Al-Omoush et al. 2022 ).

Various studies have provided an expression that the role of innovation in the resilience of SMEs. It has been able to change various fluctuations due to the opportunity crisis but this can be considered as an opportunity. It provides enough perspective for SMEs rather than just being interpreted as a threat (Thukral 2021 ; Tian et al. 2021 ). Resource-based theory has provided a meaning that companies with adequate resources will be better able to provide resilience during the crisis (Dubey 2023 ). Recent events have imposed more restrictions on the flow of resources. A global scale, which SMEs are doing to provide efficiency by embracing frugal business models (Craighead et al. 2020 ; Karmaker et al. 2021 ). In response to this crisis, there has been an increased awareness among SMEs about the need for frugal innovation concepts (Dubey 2023 ).

Ambidextrous viewpoints in terms of internal and external perspectives and Dynamic capability

In understanding the role of ambidextritis is in the point of view of companies. Especially SMEs, the study in this research provides a combination of two theories resource based theory and resource dependence theory. This theory will imply how to use a dynamic capability perspective. Resource based theory provides a form of statement that the company needs to assume when using its internal productive resources that are valuable, irreversible, scarce and cannot be replaced. These resources can be pursued to drive various innovations and outperform its competitors in creating competitive advantage (Barney 1991 ; Weidong, 2007 ). Based on this assumption, with the full control of the company over its internal resources, the company is directed to have freedom. It is autonomously able to carry out various innovations needed to improve its performance (Tehseen and Sajilan 2016 ). Various types of innovation can be realized in the form of customer outreach and new markets. This innovation is referred to as an exploration strategy. While the provision of operational excellence and services provided to customers. It is often known as an exploitation strategy on the other hand resource dependent theory considers companies, especially SMEs, to be considered not independent (Roundy and Bayer 2019 ; Tehseen and Sajilan 2016 ).

Based on this assumption, there is a consequence that SMEs are considered to have various resource limitations. It required external resources that are quite critical for their survival. These resources will make them dependent on various resource providers (Pfeffer and Salancik 2003 ). One of the impacts of this dependency is the emergence of a power imbalance due to the level of criticality of the resources that will determine the level of Power (Emerson 1962 ; Roundy and Bayer 2019 ).

In other words, the company relies on critical resource needs from external sources or in this case suppliers, competitors or the government. Most of the records that will be owned by these actors both perspectives can be given an alignment related to the use of the dynamic capability perspective (Eisenhardt and Martin 2000 ). This dynamic capability perspective is able to provide information related to the existence of SMES’s. Dynamic capabilities must be able to have various capabilities in aligning various internal resources, beforehand with external resources or making resource interactions correctly (Jun-feng et al. 2017 ; Teece 2012 ). In addition, the company must also be able to provide an adjustment to the behavioral demands that are able to change dynamically. According to the existence of resource-based theory which examines the concept of freedom or autonomy. The company be controlled by there source supplier for resource dependent theory and manage a variety of both resources towards ambidexterious capabilities (O’reilly Iii and Tushman 2008 ; Tehseen and Sajilan, 2016 ). Frugal Innovation can be considered as a limited solution related to scarce resources (Shibin et al. 2018 ). This concept is defined as a frugal innovation in the framework of new thinking and action in circumstances and challenges. It has enough worst conditions. It can be realized while sensing and exploiting various opportunities and making an improvised solution to minimize the use of resources (Radjou et al. 2012 ). There are three characteristics that will distinguish frugal innovation from other types of innovation. These include the concentration on core functionality, the emergence of substantial cost reductions and the level of performance that is expected to be optimized (Weyrauch and Herstatt 2017 ).

Epistemology of the new concept of share value relational capital

Various previous studies have shown that human and relational capital and team performance have a moderately positive impact on business performance (Borchert and Zellmer-Bruhn 2010 ). For family businesses, this capital is slightly positively correlated with the firm’s financial performance (Kansikas and Murphy 2011 ). However, the direct and indirect effects of social capital on financial performance have been studied (Sousa et al. 2021 ). From this perspective, understanding that different network development processes require different contributions can be made through mobilization and different entrepreneurial practices. Relational capital can also be seen as a form of influencing the concept of the value of a product that consumers are likely to want, especially when purchasing via corporate networks (Elango and Pattnaik 2007 ; Fazli-Salehi et al. 2021 ; Ngugi et al. 2010 ). The relationship between financial performance and satisfaction and long-term performance direction, relevant to various business strategies. Adapting unique shapes to specific business conditions has been discussed before. This relational capital can be expressed in the form of learning activities that different SMEs may undertake. This is done to ensure export orientation and to be able to build relationships with buyers and customers on an international level. Models of human and managerial skills can be seen as models of culture and modern rationality. While technology is designed for this purpose, and relational models can be seen as social assets that increase various types of individual satisfaction, which ultimately improves the performance of a business (Villena et al. 2011 ). Furthermore, it can be assumed that relational capital have an impact on financial performance and exportability. There is a strong correlation between the correlation of financial performance with satisfaction and the alignment of long-term performance with respect to various business strategies. Adapting unique shapes to specific business conditions has been discussed before.

This is illustrated by the fact that every company can co-create added value. As it helps in planning opportunities at different levels, it is expected to also help in understanding a myriad of markets and products. It will provide lasting direction for the emergence of productive forces that need to be developed and redefined. Various factors, including a company’s geographic location, depend on how specific business strategies intersect with the company’s social concerns. This can be seen as an increase in share value, while market conditions are expected to align with growth, sales, and profitability. This ability to shape in some way the development of the favorable socio-economic environment provided by the firm can be defined as reproduction. This approach will lead to various enabling environments and the emergence of opportunities for optimal resource utilization and investment in areas such as employee suppliers. Creating share value encourages the development of key factors related to supplier community and time. In addition, there are institutions and infrastructures that can have an impact on improving business productivity. Thus, through a different synthesis of resource-basedtheory and intellegence theory of knowledge transfer, a new concept of “share value relational capital” is proposed. The concept is defined as the ability of firms to effectively provide technology to acquire and use knowledge and skills that are further developed in policy and operational practice to enhance competitiveness and improve economic and social conditions. This can be achieved by applying the capabilities of other parties, tracking progress, and generating new ideas when measuring outcomes and insights to unlock new value that has the potential to improve SMEs resilience.

Digital business transformation and share value relational capital

Various research conducted by academics has provided a view regarding a fairly effective way of providing understanding and adaptation. The context of an environment that contains risk and has fluctuations (Ivanov et al. 2019 ). This has been considered as a driving force to provide a dynamic development of organizational capabilities. That capabilities are able to provide a response to changes in the business environment with adaptive (Savastano et al. 2022 ). Based on this point of view, a dynamic capability must be integrated with various organizational resources through the digitization process. This aims to produce an internal and external business process of communication and collaboration, forming strategies, operations and structured culture. They provides the possibility of the emergence of a form of flexible and agile conditions in response to uncertainty and fluctuations. Scholars have also conducted various studies and confirmed the intelligent nature of digital technology. It can sustain business operations and support companies in making smarter decisions that can promote supply chain resilience in times of crisis (Craighead et al. 2020 ; Dubey 2023 ; Ivanov et al. 2019 ).

The existence of a company as an organization is expected to be able to satisfy the needs and expectations of society and this can be considered as a philosophical need. Companies that do not have competence in this regard. They will not be able to absorb and actively interact with the various needs of society and fulfill the expectations that arise from society. When these needs are not met during a crisis, the company will not be able to survive longer. Practical digital transformation has empowered companies to continue and adapt to emerging risks and opportunities and minimize the impact of the pandemic and its countermeasures (Rafique et al. 2022 ). The main role of a digital solution in the resilience of an SMES form, among others, can appear in the form of a relational capital. This capital is synonymous with customer capital which has been defined as a set of knowledge that has been attached to relationships between stakeholders. Organizations that have a good enough relationship with their customers will have better opportunities in terms of maintaining and growing their business and selling new products. This will enable customers in gaining an advantage in the supply channel (Daum 2005 ). Digital business transformation can have a positive impact on relational capital by providing improved customer relationships, expanding the network of supplier partners, increasing employee engagement, building trust and reputation through the use of technology and digital strategies. Strategically, business organizations can strengthen their relationships with stakeholders whose orientation is to lead to long-term success and produce a competitive advantage in the digital era. Based on these arguments, the following hypothesis can be drawn:

H 1 : Digital business transformation positively affects Share value relational capital

Digital business transformation, share value relational capital and resilience SMEs

The concept of capability and innovation says that these two aspects are said to have a fundamental side. That is quite essential providing an increase in organizational resilience during a crisis (Akpan et al. 2022 ; Klein and Todesco 2021 ). Proactive digital transformation has been able to provide evidence to empower companies. Itcan role to continue and adapt to various risks that arise and are able to provide minimization of the influence of the pandemic and its countermeasures (Rafique et al. 2022 ). The main role of a digital solution in the resilience of SMEs has been clearly demonstrated by previous researchers such as (Olaleye et al. 2022 ). When SMEs are under crisis pressure, the role of digitalization holds considerable essence in improving the efficiency of resource collaboration, and creating social coordination in the regenerative recovery of enterprises. Academy researchers have confirmed that the survival of the company in the event of a crisis and addressing how unforeseen events emerge as uncertainties. Reliance on their ability to utilize digital technologies is necessary (Akpan et al. 2022 ; Corsini et al. 2023 ). Online platforms have become an indispensable solution as they provide valuable communication goals. They can supporting business relationships and providing collaboration for stakeholders. They have also been able to provide the possibility for SMEs and their partners to face very challenging situations and emerge a collaborative attitude. This is necessary to maintain their chances of survival. Relational capital will refer to capabilities that are rarely possessed by various organizations. This will encourage various interactions that are quite different from the skills possessed by each SMES.

Mutual trust and collaboration can be considered as complementary for organizations in realizing the concept of relational capital. This will depend on certain channels that need to be restricted by the company. Network constraints will play a role that tends to provide benefits. It related to some opportunities because the relationship is not in the appropriate form to provide access to appropriate resources (Welbourne and Pardo-del-Val 2009 ). The share value relational capital will play a mediating role between digital business transformation and SMES’s resilience resources, collaboration networks supporting trust building and reputation building. By fostering strong relationships and creating share value, the various values of relational capital will be able to strengthen the resilience of SMEs. In the context of digital business transformation in realizing competitive advantage. When SMES’s are able to apply and have various values contained in a relational model, they can strengthen the business relationships they run. This can have an impact on the emergence of the quality of competitiveness in facing challenges quite optimally. Share value relational capital can have an impact on the emergence of resilience of SMES’s. This concept is able to provide a form of advantage in the face of changes that occur in the market progressively. It able to take advantage of new opportunities oriented for growth and long-term success. Therefore, share value relational capital has a contribution to increasing the competitiveness of SMEs. Based on these arguments, the following hypothesis can be drawn:

H 2 : Share value relational capital positively affects the resilience of SMEs .

H 3 : Share value relational capital mediates the influence between digital business transformation and resilience SMEs .

Share Value relational capital, Social ambidexterity and frugal innovation

Customers can be categorized into two groups that have been known so far, namely external and internal customers. External customers can be associated as those who are already in the market while internal customers can be defined as a company’s employees. The satisfaction of these two groups is usually expected to form a resultant or causal relationship that has a holistic impact but interestingly employee satisfaction is a competitive advantage in terms of quality (Pambudi et al. 2022 ). The ability of the company is expected to produce a form of satisfaction of internal customer needs. That tends to have a positive impact on external customers. Any problems faced regarding employee satisfaction will have an impact on internal customers, it is necessary to consider various key aspects. They must be considered, including channels such as the level of customer and marketing model networks (Kamukama et al. 2011 ). Several items have been used to provide a measure of the relationship between public investors and the customers themselves. This includes being astrategic and strong alliance to attract ownership of many suppliers. It relate with market growth potential and the emergence of long-term relationships with customers (Widiatmaka et al. 2023 ). The dimension of relational capital can be defined as a company’s relationship (Daum 2005 ). It can show with aspects or parties that have a strong correlation with various investors such as the government, suppliers and customers who are expected to know the external conditions of the company (Daum 2005 ; Kundori and Sukrisno 2023 ). The fundamental strategy faced by SMES managers is the ability to pursue the concept of value sharing because this will make company managers realize that they must take full responsibility for all activities. This is happening because globaly human resources resources cannot be sufficient. Therefore, the fundamental question of identifying social problems is considered to be quite essential. It is also able to provide assistance related to problem solving and gaining a competitive advantage (Kramer and Porter 2011 ).

It can be concluded that a strategic role should be given to disciplined decision makers in various sectors. Companies are also expected to be able to use various access resources that have certain characteristics in overcoming social challenges that are not only oriented towards social capital. This aims to provide an assurance that the conditions of relationalcapitalstrengthened by the concept of value sharing will have access to knowledge technology and skills. It is expected to be developed by companies. That are expected to be able to implement various policies in improving the competitiveness of economic conditions and producing optimal social conditions. Frugal Innovation is expected to combine various concepts of business innovation. Information technology innovation in minimizing resource use through redesign and improving business processes and products and services (Ahuja and Morris Lampert 2001 ). Many information and communication technologies that have developed are expected to fulfill the various principles of frugal innovation. Frugal innovation can change various forms of perspective on business operations as well as revolutionize the process and operational practices of a company. Such things are expected to rely exclusively on relational capital. Frugal innovation can be considered as a condition that is able to outperform high-end innovation when using leading-edge technology. Frugal Innovation has an essential contribution to economic growth (Ojha and Ayilavarapu 2016 ). Case studies in India show that frugal innovation in various business sectors shows a significant impact on economic growth. Various innovation programmes carried out in India have been able to provide a role that can reduce poverty, create new jobs, equal opportunities in obtaining employment opportunities, community empowerment through education and skill development (George et al. 2012 ).

Previous researchers have emphasized that the concept of industry 4.0 has embraced various concepts of digital transformation. It can be considered as a key pillar to increase energy and minimize the use of company resources in reducing various production costs to achieve competitive advantage (Musril et al. 2023 ; Stroumpoulis and Kopanaki 2022 ). A digitization process has increased efficiency and productivity. It has been able to improve various digital platforms in implementing the concept of frugal innovation effectively (Khanal et al. 2022 ). Digital innovation is a business innovation and information technology innovation. They are combined to minimize the use of resources and redesign and are able to improve the quality of various business processes of products and services. Many information technology applications have be endeveloped in line with the fulfillment of frugal innovation. frugal innovation can change the way of business through operations and revolutions. A process and organizational practices of such companies will exclusively depend on digital transformation (X. Zhang, 2018 ). This proposed relationship has led to a definition of digital solutions as a company’s ability to achieve its goals through frugal innovation (Ahuja and Morris Lampert 2001 ). Frugal innovation can be achieved through various aspects such as acquiring and sharing data, information and knowledge, which can be considered as essential factors in implementing it (Nassani et al. 2022 ). Tools such as information technology that can be demonstrated through the web, social media, cloud computing and the internet for various matters and business analysis. They will provide a foundation for obtaining valuable information and knowledge and enable companies to adopt new initiatives using frugal innovation techniques. Business social networks can assist SMEs in providing information related to learning and product updates. In addition, this business social network is also able to provide learning to SMEs relatedto competency strategies and market trends (Heirati et al. 2013 ).

Political social networks can provide benefits for companies to provide the latest information about macro market information, regulations, taxation and employment contracts (Agyapong and Ojo 2018 ). Market and political actors will provide different controls related to companies, especially SMEs (Darnall et al. 2019 ). Companies will face strong pressure from the government regarding various tax obligations and strict business rules as a policy generated by the government (Park et al. 2019 ). Based on this assumption, controversially, the exchange of resources with private parties will occur in a free competition when it occurs in the market (Darnall et al. 2019 ). It is very important to study that when companies such as SMEs are able to adapt in providing consistency to maintaingoodrelationships with their stakeholders. SMEs will obtain a variety of abundant external resource supplies (Darnall et al. 2019 ; Park et al. 2019 ). The relational capital can support the acquisition of the most appropriate capital from the government. This can occur holistically in the study related to the important role of how it will support the emergence of an innovation and development of a company or industry. So that it can be said that share value relational capital can provide facilities related to collaboration and networks that are expected to be able to provide a form of access to external resources from the government. Based on these assumptions, the following hypothesis can be drawn:

H 4 : Share value relational Capital positively affects Frugal Innovation .

H 5 : Share value relational Capital positively influences social media ambidexterity .

Social ambidexterity and innovation ambidexterity

The interaction between social media ambidexterity and innovation ambidexterity is determined by two underlying theories, namely resource based theory and resouece dependence theory (Barney, 1991 ; Pfeffer and Salancik, 2003 ; Tehseen and Sajilan 2016 ). Resource based theory examines innovation while resource dependent theory examines resource control as itdepends on external resources. The risk of losing resources arises when companies have given focus to their internal resources. It will have an impact on resource dependence. On the other hand, the company risks losing its autonomy in generating innovation when it relies too much on external resources. The two concepts are interconnected like two sides of a coin. To mitigate these potentialrisks, SMEs are expected to have the courage to face various risks. While creating a proactive attitude that is able to anticipate all potential problems that will occur during the process of balancing the two Dilemmas (Lumpkin and Dess 1996 ).

Regarding the role of social media ambidexterity, companies are expected to be faced with two parties that have contradictions (Nofiani et al. 2021 ). Namely the existence of political actors who have fixed control and business actors have a freer nature (Park et al. 2019 ). Ifa dimensional balance strategy has been chosen. The company is expected to be faced with a longer path. They must first build a network with market actors in obtaining resources in order to build a higher bargaining position. This aims to have a good relationship with political actors and vice versa. Once again this processis expected to be a short process that needs to be trimmed because the need to innovate andimprove performance will require a variety of fairly rapid access to abundant external resources (Lee and Kreiser 2018 ; Teece 2012 ).

Therefore, the utilization of the balance dimension strategy for social media ambidexterity is expected to be able to build partnerships with the government and competing business partners simultaneously (Nofiani et al. 2021 ). This will make the goals of the SMEs easy to realize. This innovation action can be implemented in various ways including new production processes,improving existing products, new product designs aswell as the selection of new raw materials and new product development (Todeschini et al. 2017 ). The separation of these innovation actions is expected to be an exploration and exploitation. Where the increasing demands faced by companies will force them to be able to practice these two types of innovation simultaneously. This is called ambidexterity innovation. This can be realized by increasing various knowledge of customer engagement agility and adaptability. However, companies in this case SMEs also need to pay attention to various risks related to disruption and external resource allocation challenges. With effective social media patterns, companies can take advantage of the potential of social media. Companies generating various defenses to maintain a balanced approach to ambidexterity innovation based on these assumptions, the following hypothesis can be drawn:

H 6 : Social media ambidexterity positively affects innovation ambidexterity .

Innovation ambidexterity and resilience SMES

On the other hand, the dynamic composition of the innovation dimension. It can take the form of exploration and exploitation and the social network dimension formed from business and politics. The combination of these factors can provide support for the emergence of a combined strategy (Dunlap et al. 2016 ). The existence of a company when it is able to transform various internal resources into an innovative product process or service to improve performance. It will leadto the emergence of an increased bargaining positionthat has a high value to its stakeholders (Pfeffer and Salancik 2003 ). This improved bargaining position can then accelerate the acquisition of external resources. Stakeholders may grant trust rather than strict control (Nahapiet and Ghoshal 1998 ). These two sequential processes are strongly influenced by several things related to the tendency of companies to proactively find new ways to orchestrate available resources (Lumpkin and Dess 1996 ; Teece 2012 ). Thus it can be concluded that the high level of the company’s ability. It utilized internal resource advantages through the innovation ambidecterity process. Exploration and exploitation will be able to have an impact on the emergence of an abundant supply of resources obtained from various previous ambideterious social networking activities. The combination dimension of ambidexterity will positively affect the resilience of SMEs. Innovation ambidexterity is generally able to positively impact the resilience of SMEs. It can show through balancing risk-taking, enabling agility, diversifying revenue streams, enhancing customer value propositions and encouraging cultural innovation. It provide a competitive advantage. Innovation ambidexterity will be better prepared. Because they will be able to provide a form of navigation of uncertainties that arise through various efforts such as utilizing opportunities to face challenges. It will ultimately provide an increase in their resilience and long-term sustainability. Based on these assumptions, thefollowing hypothesis can be drawn:

H 7 : Innovation ambidexterity positively affectsthe resilience of SMEs .

Frugal Innovation and Resilience SMES

Innovation can be considered as a dynamic capabilitythat has important value for companies, especially whenunexpected situations occur (Karmaker et al. 2021 ). Innovation has become a vital tool in the current state of organizational survival. World characterized by dwindling resources and intense competition as well asthe emergence of rapid change and successive andunexpected crises (Karmaker et al. 2021 ). Innovationcan be considered as a characteristic that will distinguishsuccessful businesses. Innovation are able to protect their survival. When unusual crises have occurred, academic researchers have also provided findings related to empowering organizational resilience. The development and use of dynamic organizational capacities that require effective innovation and unprecedented regulatory responses (Korber & McNaughton 2017 ; Weaven et al. 2021 ). Innovation capabilities may be able to generate an Inovation and develop SMEs. This ability is a driver of their survivaland can be a tool. This ability can provide control over the growth of these SMEs. The best solution for theprogress of SMEs in facing challenges is to use theirvarious resources in producing various innovations. This solution will allow them to continue growing (Isichei et al. 2020 ). The frugal innovation approach is a solution that is considered quite effective in dealing with unexpected and extraordinary crises. Innovation programs can be considered as an extraordinary technique. It can implement in generating thinking andacting technology and it can be considered one of thesimplest ways and most suitable solutions, especially when crisis conditions such as yesterday’s pandemic (Corsini et al. 2023 ; Dubey 2023 ; Vesci et al. 2021 ). Digital transformation definitely has an indirect relationship with rugal Innovation has a significant impact on the adaptability and dynamic capabilities of a company this study has successfully collected data from 214 owners and managers of small medium enterprises in an emergency market and used data collection through the question method. this study provides an illustration. Frugal Innovation which has relevance to resilient SMEs is the power to survive and adapt to a formof uncertain environmental change. Frugal innovation can have a positive impact on SMEsresilience through cost optimization, increasing marketpower, promoting adaptability and optimizing resources and providing a competitive advantage. This will contribute to the sustainability of SMEs based on these assumptions SMEs are expected tounderstand innovation programs to be better prepared to facevarious challenges and seize opportunities to thrive in a dynamic and uncertain business environment based on these arguments the following hypothesis can be drawn:

H 8 : Frugal innovation positively affect resilience SMEs .

Methodology

We selected tourism SMEs in the provinces of Central Java and Yogyakarta Special Region to test our research model for several reasons. The first reason is that the resource frontier of tourism SMEs in these provinces will limit their ability to adopt innovations and make investments so that their optimal competitiveness is expected to be greater with the contribution of this research. The second reason is the uncertainty in the business environment will make tourism SMEs tend to be more vulnerable to changes due to limited resources and lower bargaining power. In addition, the lack of visibility in marketing is also an obstacle for tourism SMEs in increasing the resilience of tourism SMEs. Because less effective marketing can make it difficultfor themto reach a wider range of target markets from the results of data collection, the number of samples is obtained as much as the description of several categories of smes as follows: for engaged in hotels amounted to 42, while smes engaged in the culinary sector amounted to 61, for the smes sector engaged in 13 tourist villages, 12 trade tours and 4 travel agents. To determine the sample size in this study we refer to the study conducted by (Hair et al. 2006 ). It provides a statement that for structural equations that use zinc with 5 or more constructs owned, the minimum sample size must be 100. To achieve a statistical power level of 0.95 we adopted the sample size from Chopper in 2006 based on 5 variables and with 27 indicators in this study with a probability of 0.05, based on this rule, the minimum sample size is 292 (Hair et al. 2006 ; Soper 2006 ). Since our actual sample size is also above 300, this level of adequacy has been achieved. The current study was based on a sample for which data collection was done through questionnaires and structured Interviews. We gave interviews to the managers of SMEs using questionnaires then their responses were directly poured into the questionnaire sheets that have been provided from the survey conducted. We get 568 respondents from 132 tourism SMEs in the province of Central Java and Special Region of Yogyakarta SMEs are engaged in the tourism sector restaurants and travel agents. From the sample obtained of 132 SMEs tourism obtained 584 respondents and as many as 16 respondents were excluded during the promotion process during data processing because the responses to the questionnaire items given to by these respondents tended not to change so that the final data obtained 568 respondents as owners and managers of these SMEs which is still above the minimum sample size required for hypothesis testing.

Scale and measurement

To determine the scale and measurement in this study we used the encoring technique proposed by (Nunnally and Bernstein 1994 ). To be able to easily provide captureof income information from respondents we used a bipolar numerical scale developed by attributing each oftheir responses. To a number between 1 and 10. Number1 strongly disagree and 10 strongly agree. The digital business transformation variable is measured based on (Ruel et al. 2020 ; X. Zhang 2018 ). While the frugal innovation variable is measured based on (Kun, 2022 ; Zhang 2018 ), the social media ambidexterity variable is measured by (Kun 2022 ; X. Zhang 2018 ), the Innovation ambidexterity variable is measured based on (Jansen et al. 2006 ). share value Relational capital variable is measured based on (Kramer and Porter 2011 ; Martins 2023 , Pambudi Widiatmaka et al. 2023 ), and resilience small medium enterprise variable is measured based on (Koronis and Ponis 2018 ; Zhang et al. 2022 ).

The validity of the liability measurement scale

The confirmatory factor analysis performed in this study was used to provide an assessment of validity and a brief overview of the indicators to be used. To provide ameasure to use when we find a distribution of data that exceeds the normalization criteria, we refer to the formula for processing denormalized data (Tabachnick et al. 2013 ). We apply the negativerootwith the formula Xn = 1/(k-X). This will provide a normalized data distribution result. This study deals with (Arbuckle 2016 ; Tabachnick et al. 2013 ). The mean of variance extracted or (AVE) to give an assessment of the quality level of the items in the conclusion construction is takenfrom the AVE value of each variable that lies above the threshold 0, 5 and all values of items with higher standardized load factors (Bagozzi and Yi 1988 ). We apply the reliability criterion of (Arbuckle 2016 ) in which it is expected that the value of each variable has a confidence greater than 0.7. According to the measurement results, the reliability value of each research variable is greater than 0.7 can be seen as shown in Table 1 and for goodness of fit index it can be seen in Table 2 below.

The analysis and statistical results of A m o s 25, form were used to test the model that has been proposed in this study to obtain the model and test our proposed hypotheses. We chose a scientific technique in testing the model of this study and these hypotheses due to various factors. The first factor is a work-based equation where the same variable can represent a regressor or predictor in one equation and a regression criterion in another equation suitable for the proposed research model (Nachtigall et al. 2003 ). The second factor is that it allows researchers to provide answers. A series of interrelated questions in a single systematic and comprehensive analysis and provides modeling of the relationship between multiple independent and dependent theoretical constructs simultaneously (Tarka 2018 ). The third factor is the advantage of SEM in analysis which is able to test the mediation process simultaneously (Tabachnick et al. 2013 ).

To test the model and hypotheses proposed in this study, a three-step process was conducted. The first step is to test the fit, the model proposed in this study will be evaluated. To test the model and hypotheses proposed in this study. Statistical analysis will be conducted in three stages. The first process is the fit test which is conducted to test the feasibility of the model. This is an evaluation of the acceptance of the research model. The results obtained Chi Square value of 208.904, significance 0.00, GFI: 0.956, NFI 0.942, CFI 0.959, TLI 0.961 and RMSEA value 0.03. According to (Arbuckle, 2016 ; Tabachnick et al. 2013 ) this evaluation procedure results in modal acceptance and further analysis to test our proposed hypotheses. This evaluation procedure results in the acceptance of capital and further analysis to test the hypotheses we have proposed.

Hypothesis testing

The second process is Hypothesis Testing as listed in Table 3 according to the criteria. When the probability value is less than 0.05 and the value of CR is more than 1.96 then the hypothesis is significant and accepted. Hypothesis Testing confirmatory factor analysis model that has been transformed into a structural model to test the hypotheses in this study. Table 3 shows the test results of hypothesis testing. And the regression coefficient for the path of each hypothesis H1 = 0.461, H2: 0.882, H4: 0.617, H5: 0.706, H6: 0.708, H7: 0.782, H8: 0.781, with a critical ratio value greater than 2.0 it shows that all the hypotheses proposed are accepted. The third process is to determine the mediation effect that has been determined in this study in accordance with the previous hypotheses that have been built.

Mediation testing

This study uses the sobel test to test the mediation hypothesis proposed. In this study when the z value is greater than 1.96 and the probability value is below 0.05. To test the mediation effect tested in third hypothesis (H3) were tested using software tests according to (Hayes et al. 2009 ). The test results get the respective values of Z, H3: Z: 6.179634 from this shows that there is a mediating effect from each of the variables tested. Then there is a mediating effect from the variables tested. in this case mediation hypotheses proposed in this study are accepted.

To the best of our knowledge, there is no literature that attempts to provide an understanding to build and confirm a model based on the strategic relationship between share value relational capital, innovation and ambidexterity to increase resilience SMEs. The concept of share value relational capital is expected to play a role in bridging digital business transformation and improving resilience SMEs. Accepting the hypotheses proposed in this study, conclusions can be drawn to generate different strategic paths in increasing resilience SMEs. The first path moves from digital business transformation to share value relational capital to resilience SMEs. Digital business transformation must be used as a digital technology capability in increasing relational capital, which will contribute to the formation of an attitude of share value relational capital. Digital business transformation can be seen as a strategic input. It is important. However, if this input cannot be used as an impetus to highlight a form of relational capital from the point of view of SME resilience. It will not be possible to create it. This study confirms the conceptual proposal of relationship capital of the resource-based theory (Amit and Schoemaker 1993 ; Barney 1991 ; Hamel and Prahalad 1990 ; Long and Vickers-Koch 1995 ; Mata et al. 1995 ; Wernerfelt 1995 ). Knowledge must not only be created and shared, it must also be transferred and articulated correctly to consumers where relational capital exists.

The impact of this digital business transformation on relational capital tends to be positive for SME resilience. There is some contribution to SME resilience from initiatives such as improved communication, expansion of customer engagement networks, data-driven decision making, and increased agility and branding. This is expected to create a form of mutually supportive relationship. The aim is to provide easy access to resources, build a loyal customer base and adapt to various changes. It can be leveraged to create a competitive advantage for resilient SMEs.

The second strategic path is from digital business transformation to share value relational capital to frugal innovation to resilience SMEs. Share value relational capital is based on the concept of relational capital. This concept must underpin building frugal technology innovation. Logically, we can hope to improve the resilience of SMEs if frugal technology can be achieved. This means that digital business transformation must underpin how SMEs are trying to survive. If digital business transformation can be used to emphasize relational capital. This may involve creating new knowledge, absorbing social problems, sharing knowledge and tracking progress. It can then facilitate the emergence of SME self-restructuring. It will execute their business processes and be adaptable to any uncertainties. It will also be able to provide effective risk management during a crisis. In addition, the ability to maintain a conducive and deep-rooted relationship with various business partners is also emerging. SMEs are expected to provide access to knowledge, market insight, cost optimization, agility, collaboration and competitive differentiation. They are expected to contribute to the competitiveness efforts of SMEs. Because it is logically impossible for firms. Without competition, managers of SMEs cannot develop cost effective solutions, adapt to competitive market changes and develop an attitude that promotes strong relationships with different stakeholders.

Our study confirms the third strategic path from digital business transformation to shared value relational capital to social media ambidexterity to innovation ambidexterity to SMEs. This study shows that digital business transformation can support the concept of shared value. It will be able to create an effort to generate a variety of new knowledge and the ability to absorb various social issues. It is expected that this will support a form of the concept that will determine the success of SMEs in order to survive. It is our contention that when ambidexterity is optimally applied in the concept of social media and from these effects, innovation effects can be created (Kun 2022 ; X. Zhang 2018 ). This creates a form of effort to maximally support SME resilience. In a comprehensive way, we can summarize this strategy in the following description: if the SMEs are able to adopt various digital technologies and create a form of relationship building that makes effective use of social media. This will impact on SMEs’ ability to develop adaptive and innovative approaches and face different challenges. These solutions will make tourism SMEs more resilient in dynamic and high-risk business contexts.

The acceptance of all the hypotheses that appear in this study is certainly to be expected to a high degree. Acceptance of all these hypotheses will bring SMES management insight. This result will bring an effort to achieve success in the building of a resilient SME that is based on innovation and ambidexterity. The results of this research will be of more substantial value if digital business transformation is able to increase the role of the new concept of share value relational capital. And share value relational capital can increase frugal innovation and ambidexterity in the innovation and social context. Frugal innovation and ambidexterity are strong enough to even become an asset for SMEs to increase their resilience.

Research implications

The findings of this study contribute to the knowledge of the theoretical implications of the application of resource-based theory in management science for SMEs. The first implication of this study is the novelty of synthesizing a new concept, share value relational capital, rooted in (Barney 1991 ). It is expected that this theory will strengthen the assumption that digital business transformation is the most important capital that needs to be owned by the SMEs in order to create a form of relational capital. The findings of this study show that dynamic capability is the basis for the emergence of innovation and ambidexterity variables. Through share value relational capital, this is expected to be a core form of ambidexterity of a large firm. This is applicable to the management concept of resource-based and resource-dependency theories. This contribution has provided a disposition in management science to SMEs. This contribution can continue to innovate and adopt various digital technologies. If it is able to generate a separate relational capital to customers, the result of the process of relational capital and value sharing can be considered as part of dynamic capability. This is the key to the survival of SMEs. Mutual trust and cooperation can be seen as a complement to the organization in realizing social capital. This is a concrete response to how the SME is expected to adopt the philosophy of resource-based theory in improving the resilience of the SME. Managerial Insights The present study is consistent with the role of the mediator variable of equity value relationship capital for improving the competitiveness and resistance of SME. It can be identified from the role of the new concept of share value relational capital. SMEs can absorb social problems. This has implications for how managers of SMEs adapt to deal with uncertainty and disruption in a more agile way.

Moreover, they will also be able to present quite vital and quick. To overcome the problems related to uncertain risk. It is also expected that SMEs are able to create and share new knowledge in order to measure and exploit their knowledge. It is expected to provide various forms of resources needed when SMEs experience crisis situations. During the crisis, it is able to maintain smooth operations and relationships. SMEs are also expected to be able to absorb, coordinate and integrate different business cases. SMEs are expected to have a deep-rooted relationship with business partners. Overall, we can conclude that SMEs need to realize the importance of the concept of share value relational capital with the ability to build a variety of strong relationships with related parties. SMEs can easily access the necessary resources. Moreover, the support of innovation and adaptation and solidarity will better build the reputation of SMEs. SME’s need all these factors. They will need to continue to adapt to survive and thrive in a changing and challenging business environment characterized by high uncertainty and global insecurity.

Limitations and future research

This study can be further directed to explain how the concept of managing the process of share value relational capital. The concept is as a mechanism to improve the competitiveness of SMEs, especially to explain its function in increasing its resilience. The study related to the new concept of share value relational capital. It has been presented in this paper is only a preliminary further development still needed. Especially in producing dimensization and construction of the concept of share value relational capital. The dimensions are needed as an instrument in the strategic management strategy to improve the resilience of SMEs. This study only covers the sample frame in the province of Central Java and Yogyakarta Special Region. But the model in the sample frame has successfully shown the level of hypothesis acceptance. It means that all variables are significant and the sample frame is a representative reason for testing this model. The concept of share value relational capital has a form of generalization power in increasing the resilience of SMEs. A replication of the share value relational capital study can provide an opening research venue to get a broader generalization of this new concept.

Data availability

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request. https://doi.org/10.5281/zenodo.12669446 .

Agyapong F, Ojo TK (2018) Managing traffic congestion in the Accra central market, Ghana. J Urban Manag 7(2):85–96

Article   Google Scholar  

Ahuja G, Morris Lampert C (2001) Entrepreneurship in the large corporation: A longitudinal study of how established firms create breakthrough inventions. Strat Manag J 22(6‐7):521–543

Akpan IJ, Udoh EAP, Adebisi B (2022) Small business awareness and adoption of state- of-the-art technologies in emerging and developing markets, and lessons from the COVID- 19 pandemic. J Small Bus Entrepreneurship 34(2):123–140. https://doi.org/10.1080/08276331.2020.1820185

AL-Khatib A (2023) The impact of big data analytics capabilities on green supply chain performance: Is green supply chain innovation the missing link? Bus Process Manag J 29(1):22–42. https://doi.org/10.1108/BPMJ-08-2022-0416

Al-Omoush KS, Ribeiro-Navarrete S, Lassala C, Skare M (2022) Networking and knowledge creation: Social capital and collaborative innovation in responding to the COVID-19 crisis. J Innov Knowl 7(2):100181

Al-Smadi MO (2023) Examining the relationship between digital finance and financial inclusion: Evidence from MENA countries. Borsa Istanb Rev 23(2):464–472. https://doi.org/10.1016/j.bir.2022.11.016

Amit R, Schoemaker PJ (1993) Strategic assets and organizational rent. Strat Manag J 14(1):33–46. https://doi.org/10.1002/smj.4250140105

Arbuckle J (2016) IBM® SPSS® AmosTM user’s guide

Bagozzi RP, Yi Y (1988) On the evaluation of structural equation models. J Acad Mark Sci 16:74–94

Barney J (1991) Special theory forum the resource- based model of the firm: Origins, implications, and prospects. J Manag 17(1):97–98. https://doi.org/10.1177/014920639101700107

Bellaaj M (2021) Why and how do individual entrepreneurs use digital channels in an emerging market? Determinants of use and channel coordination. International Journal of Emerging Markets. https://doi.org/10.1108/IJOEM-08-2020-0882

Berry LL (1983) Relationship Marketing of Services Perspectives from 1983 and 2000. J Relatsh Mark 1(1):59–77. https://doi.org/10.1300/J366v01n01_05

Bogatyreva K, Shirokova G, Wales WJ, Germain R (2022) Foreign motivation? Managerial international exposure and international regional involvement effects on firms’ entrepreneurial orientation. Eur J Int Manag 18(1):52–75. https://doi.org/10.1504/EJIM.2022.123753

Borchert P, Zellmer-Bruhn DM (2010) Reproduced with permission of the copyright owner. Further reproduction prohibited without. J Allergy Clin Immunol 130(2):556. https://doi.org/10.1016/j.jaci.2012.05.050

Cao Q, Gedajlovic E, Zhang H (2009) Unpacking organizational ambidexterity: Dimensions, contingencies, and synergistic effects. Organ Sci 20(4):781–796. https://doi.org/10.1287/orsc.1090.0426

Corsini F, Annesi N, Annunziata E, Frey M (2023) Exploring success factors in food waste prevention initiatives of retailers: The critical role of digital technologies. Br Food J. https://doi.org/10.1108/BFJ-01-2023-0034

Craighead CW, Ketchen Jr DJ, Darby JL (2020) Pandemics and supply chain management research: Toward a theoretical toolbox. Decis Sci 51(4):838–866. https://doi.org/10.1111/deci.12468

Article   PubMed   PubMed Central   Google Scholar  

Darnall N, Welch E, Cho S (2019) Sustainable supply chains and regulatory policy. Handbook on the Sustainable Supply Chain. Cheltenham, UK: Edward Elgar, 513–525. https://doi.org/10.4337/97817864342

Daum K (2005) Entrepreneurs: The artists of the business world. J Bus Strategy 26(5):53–57

Daum JH (2005) Beyond budgeting-breaking free from the annual fixed budget: a discussion between experts from borealis, nestlé, unilever, and sap. measuring business excellence, 9(1)

Dian W, Pambudi W, Leonardus S, Sukrisno S, Kundori K (2022) The mediating role of environmental sustainability between green human resources management, green supply chain, and green business: A conceptual model. Uncertain Supply Chain Manag 10(3):933–946

Dubey R (2023) Unleashing the potential of digital technologies in emergency supply chain: The moderating effect of crisis leadership. Ind Manag Data Syst 123(1):112–132. https://doi.org/10.1108/IMDS-05-2022-0307

Duchek S (2020) Organizational resilience: A capability-based conceptualization. Bus Res 13(1):215–246. https://doi.org/10.1007/s40685-019-0085-7

Dunlap D, Parente R, Geleilate J-M, Marion TJ (2016) Organizing for innovation ambidexterity in emerging markets: Taking advantage of supplier involvement and foreignness. J Leadersh Organ Stud 23(2):175–190. https://doi.org/10.1177/1548051816636621

Dweck CS (2006) Mindset: The New Psychology of Success. New York: Random House Publishing Group

Eisenhardt KM, Martin JA (2000) Dynamic capabilities: What are they? Strategic Manag J 21(10‐11):1105–1121. https://doi.org/10.1002/1097-0266(200010/11)21:10/11<1105::AID-MJ133>3.0.CO;2-E

Elango B, Pattnaik C (2007) Building capabilities for international operations through networks: A study of Indian firms. J Int Bus Stud 38:541–555. https://doi.org/10.1057/palgrave.jibs.8400280

Emerson TI (1962) Toward a general theory of the First Amendment. Yale Lj 72:877

Fazli-Salehi R, Azadi M, Torres IM, Zúñiga MÁ (2021) Antecedents and outcomes of brand identification with Apple products among Iranian consumers. J Relatsh Mark 20(2):135–155. https://doi.org/10.1080/15332667.2020.1755948

Fu H, Chen W, Huang X, Li M, Köseoglu MA (2020) Entrepreneurial bricolage, ambidexterity structure, and new venture growth: Evidence from the hospitality and tourism sector Int J Hospitality Manag 85:102355. https://doi.org/10.1016/j.ijhm.2019.102355

George G, Mcgahan AM, Prabhu J (2012) Innovation for Inclusive Growth: Towards a Theoretical Framework and a Research Agenda. J Manag Stud 49(4):661–683. https://doi.org/10.1111/j.1467-6486.2012.01048.x

Gupta AK, Govindarajan V (2002) Cultivating of global mindset. Acad Manag Executive 16(1):116–126. https://doi.org/10.5465/AME.2002.6640211

Hair JF et al. (2006) Multivariate Data Analysis. 6th Edition, Prentice Hall, Upper Saddle River

Hamel G, Prahalad CK (1990) The core competence of the corporation. Harv Bus Rev 68(3):79–91

Google Scholar  

Hayes B (2009) Introductory phonology. UK: Wiley-Blackwell Publication

Heirati N, O’Cass A, Ngo LV (2013) The contingent value of marketing and social networking capabilities in firm performance. J Strategic Mark 21(1):82–98. https://doi.org/10.1080/0965254X.2012.742130

Hernita H, Surya B, Perwira I, Abubakar H, Idris M (2021) Economic business sustainability and strengthening human resource capacity based on increasing the productivity of sma l and medium enterprises (SMES) in Makassar city, Indonesia. Sustainability 13(6):3177

Iborra M, Safón V, Dolz C (2020) What explains the resilience of SMEs? Ambidexterity capability and strategic consistency. Long Range Plan 53(6):101947. https://doi.org/10.1016/j.lrp.2019.101947

Isichei EE, Emmanuel Agbaeze K, Odiba MO (2020) Entrepreneurial orientation and performance in SMEs: The mediating role of structural infrastructure capability. Int J Emerg Mark 15(6):1219–1241. https://doi.org/10.1108/IJOEM-08-2019-0671

Ivanov D, Dolgui A, Sokolov B (2019) The impact of digital technology and Industry 4.0 on the ripple effect and supply chain risk analytics. Int J Prod Res 57(3):829–846. https://doi.org/10.1080/00207543.2018.1488086

Jansen JJP, Van Den Bosch FAJ, Volberda HW (2006) Exploratory innovation, exploitative innovation, and performance: effects of organizational antecedents and environmental moderators. Manag Sci 52:1661–1674. https://doi.org/10.1287/mnsc.1060.0576

Jia L, Du Y, Chu L, Zhang Z, Li F, Lyu D, Li Y, Zhu M, Jiao H, Song Y, Shi Y, Zhang H, Gong M, Wei C, Tang Y, Fang B, Guo D, Wang F, Zhou A, Qiu Q (2020) Prevalence, risk factors, and management of dementia and mild cognitive impairment in adults aged 60 years or older in China: a cross-sectional study. Lancet Public Health 5(12):e661–e671. https://doi.org/10.1016/S2468-2667(20)30185-7

Article   PubMed   Google Scholar  

Jun-feng N, Wei-ping L, Bin-he F, Zheng Z, Bo Y (2017) Predictive Operation Time Model for Information Processing Task of Crew in Armored Vehicle. Acta Armamentarii 38(2):233

Kamukama N, Ahiauzu A, Ntayi JM (2011) Competitive advantage: Mediator of intellectual capital and performance. J Intellect Cap 12(1):152–164. https://doi.org/10.1108/14691931111097953

Kansikas J, Murphy L (2011) Bonding family social capital and firm performance. Int J Entrepreneurship Small Bus 14(4):533–550. https://doi.org/10.1504/IJESB.2011.043474

Karami M, Tang J (2019) Entrepreneurialorientation and SMES international performance: The mediating role of networking capability and experiential learning. Int Small Bus J 37(2):105–124. https://doi.org/10.1177/0266242618807275

Karmaker CL, Ahmed T, Ahmed S, Ali SM, Moktadir MA, Kabir G (2021) Improving supply chain sustainability in the context of COVID-19 pandemic in an emerging economy: Exploring drivers using an integrated model. Sustain Prod Consum 26:411–427. https://doi.org/10.1016/j.spc.2020.09.019

Khanal PB, Aubert BA, Bernard J-G, Narasimhamurthy R, Dé R (2022) Frugal innovation and digital effectuation for development: The case of Lucia. Inf Technol Dev 28(1):81–110. https://doi.org/10.1080/02681102.2021.1920874

Klein VB, Todesco JL (2021) COVID‐19 crisis and SMEs responses: The role of digital transformation. Knowl Process Manag 28(2):117–133. https://doi.org/10.1002/kpm.1660

Korber S, McNaughton RB (2017) Resilience and entrepreneurship: A systematic literature review. Int J Entrepreneurial Behav Res 24(7):1129–1154. https://doi.org/10.1108/IJEBR-10-2016-0356

Koronis E, Ponis S (2018) A strategic approach to crisis management and organizational resilience. J Bus Strategy 39(1):32–42. https://doi.org/10.1108/JBS-10-2016-0124

Kramer MR, Porter M (2011) Creating share value (Vol. 17). FSG Boston, MA, USA

Kun M (2022) Linkages between knowledge management process and corporate sustainable performance of Chinese small and medium enterprises: Mediating role of frugal innovation. Front Psychol 13:850820. https://doi.org/10.3389/fpsyg.2022.850820

Kundori, Sukrisno (2023) Valued-Based Selling Capability and Marketing Support Advantage: Suggestions and an Assessment of Past Research To Improve the Sales of a Shipping Company. KnE Social Sciences. https://doi.org/10.18502/kss.v8i9.13407

Lee Y, Kreiser PM (2018) Entrepreneurial orientation and ambidexterity: Literature review, challenges, and agenda for future research. The Challenges of Corporate Entrepreneurship in the Disruptive Age, 37–62. https://doi.org/10.1108/S1048-473620180000028002

Levy O, Beechler S, Taylor S, Boyacigiller NA (2007) What we talk about when we talk about “global mindset”: Managerial cognition in multinational corporations. J Int Bus Stud 38(2):231–258. https://doi.org/10.1057/palgrave.jibs.8400265

Long C, Vickers-Koch M (1995) Using core capabilities to create competitive advantage. Organ Dyn 24(1):7–22. https://doi.org/10.1016/0090-2616(95)90032-2

Lubatkin MH, Simsek Z, Ling Y, Veiga JF (2006) Ambidexterity and performance in small-to medium-sized firms: The pivotal role of top management team behavioral integration. J Manag 32(5):646–672. https://doi.org/10.1177/0149206306290712

Luger J, Raisch S, Schimmer M (2018) Dynamic balancing of exploration and exploitation: The contingent benefits of ambidexterity. Organ Sci 29(3):449–470. https://doi.org/10.1287/orsc.2017.1189

Lumpkin GT, Dess GG (1996) Clarifying the entrepreneurial orientation construct and linking it to performance. Acad Manag Rev 21(1):135–172

Madni AM, Jackson S (2009) Towards a conceptual framework for resilience engineering. IEEE Syst J 3(2):181–191. https://doi.org/10.1109/JSYST.2009.2017397

Article   ADS   Google Scholar  

Martin M, Syamsuri S, Pujiastuti H, Hendrayana A (2021) Pengembangan E-Modul Berbasis Pendekatan Contextual Teaching and Learning Pada Materi Barisan Dan Deret Untuk Meningkatkan Minat Belajar Siswa SMP. J Derivat: J Matematika Dan Pendidik Matematika 8(2):72–87. https://doi.org/10.31316/j.derivat.v8i2.1927

Martins A (2023) Dynamic capabilities and SMES performance in the COVID-19 era: The moderating effect of digitalization. Asia- Pac J Bus Adm 15(2):188–202. https://doi.org/10.1108/APJBA-08-2021-0370

Mata FJ, Fuerst WL, Barney JB (1995) Information technology and sustained competitive advantage: A resource-based analysis. MIS Quarterly, 487–505. https://doi.org/10.2307/249630

Messabia N, Fomi PR, Kooli C (2022) Managing restaurants during the COVID-19 crisis: Innovating to survive and prosper. J Innov Knowl 7(4):100234. https://doi.org/10.1016/j.jik.2022.100234

Metcalfe JS, Ramlogan R (2006) Creative destruction and the measurement of productivity change. Rev de l’OFCE 97(5):373–397. https://doi.org/10.3917/reof.073.0373

Mousa SK, Othman M (2020) The impact of green human resource management practices on sustainable performance in healthcare organisations: A conceptual framework. J Clean Prod 243:118595. https://doi.org/10.1016/j.jclepro.2019.118595

Murshed SM (2004) When does natural resource abundance lead to a resource curse? (Vol. 24137). International Institute for Environment and Development, Environmental Economics Programme

Musril HA, Saludin S, Firdaus W, Usanto S, Kundori K, Rahim R (2023) Using k-NN Artificial Intelligence for Predictive Maintenance in Facility Management. Int J Electr Electron Eng 10(6):1–8. https://doi.org/10.14445/23488379/IJEEE-V10I6P101

Nachtigall C, Kroehne U, Funke F, Steyer R (2003) Pros and cons of structural equation modeling. Methods Psychol Res Online 8(2):1–22

Nahapiet J, Ghoshal S (1998) Social capital, intellectual capital, and the organizational advantage. Acad Manag Rev 23(2):242–266

Nahavandi A, Malekzadeh AR (1988) Acculturation in mergers and acquisitions. Int Executive 30(1):10–12. https://doi.org/10.1002/tie.5060300103

Napier NK, Nilsson M (2008) The Creative Discipline: Mastering the Art and Science of Innovation. Bloomsbury Academic. https://books.google.co.id/books?id=daDt4IBN6kYC

Nassani AA, Sinisi C, Mihai D, Paunescu L, Yousaf Z, Haffar M (2022) Towards the Achievement of Frugal Innovation: Exploring Major Antecedents among SMEs. Sustainability 14(7):4120. https://doi.org/10.3390/su14074120

Ngugi IK, Johnsen RE, Erdélyi P (2010) Relational capabilities for value co‐creation and innovation in SMEs. J Small Bus Enterp Dev 17(2):260–278. https://doi.org/10.1108/14626001011041256

Nofiani D, Indarti N, Lukito-Budi AS, Manik HFGG (2021) The dynamics between balanced and combined ambidextrous strategies: A paradoxical affair about the effect of entrepreneurial orientation on SMEs’ performance. J Entrepreneurship Emerg Econ 13(5):1262–1286. https://doi.org/10.1108/JEEE-09-2020-0331

Nunnally JC, Bernstein IH (1994) Psychometric Theory New York. NY: McGraw-Hill

O’reilly Iii CA, Tushman ML (2008) Ambidexterity as a dynamic capability: Resolving the innovator’s dilemma. Res Organ Behav 28:185–206. https://doi.org/10.1016/j.riob.2008.06.002

Ojha NP, Ayilavarapu D (2016) Leapfrogging The World With Frugal Innovation

Olaleye SA, Mogaji E, Agbo FJ, Ukpabi D, Adusei AG (2022) The composition of data economy: A bibliometric approach and TCCM framework of conceptual, intellectual and social structure. Inf Discov Deliv 51(2):223–240. https://doi.org/10.1108/IDD-02-2022-0014

Ormrod JE (2012) Human Learning (6th ed.). Pearson

Pambudi WF, Dian W, Suherman S, Leonardus SBA, Sukrisno S (2022) How Do Green Investment, Corporate Social Responsibility Disclosure, and Social Collaborative Initiatives Drive Firm’s Distribution Performance? J Distrib Sci 20(4):51–63

Pambudi Widiatmaka F, Sukrisno S, Suherman S, Bintang AM, Leonardus S, Cahya Susena K (2023) Share value relational capital: Suggestions for the future and an assessment of past research in driving marketing performance. J Innov Business Econ 6(02). https://doi.org/10.22219/jibe.v6i02.22495

Park Y, Yip T, Park H (2019) An analysis of pilotage marine accidents in Korea. The Asian Journal of Shipping and Logistics, Query date: 2023-03-25 12:09:02. https://www.sciencedirect.com/science/article/pii/S2092521219300070

Peng MW, Luo Y (2000) Managerial ties and performance in a transition economy: The nature of a micro-macro link. Acad Manag J 43(3):486–501. https://doi.org/10.5465/1556406

Pfeffer J, Salancik GR (2003) The external control of organizations: A resource dependence perspective. Stanford University Press

Qu Y, Eguchi A, Ma L, Wan X, Mori C, Hashimoto K (2023) Role of the gut–brain axis via the subdiaphragmatic vagus nerve in stress resilience of 3,4-methylenedioxymethamphetamine in mice exposed to chronic restrain stress. Neurobiology of Disease, 189. https://doi.org/10.1016/j.nbd.2023.106348

Radjou N, Prabhu J, Ahuja S (2012) Jugaad innovation: Think frugal, be flexible, generate breakthrough growth. John Wiley & Sons

Rafique MZ, Nadeem AM, Xia W, Ikram M, Shoaib HM, Shahzad U (2022) Does economic complexity matter for environmental sustainability? Using ecological footprint as an indicator. Environ, Dev Sustainability 24(4):4623–4640. https://doi.org/10.1007/s10668-021-01625-4

Rehman S, Mohamed R, Ayoup H (2019) The mediating role of organizational capabilities between organizational performance and its determinants. Journal of Global Entrepreneurship Research, 9(1). https://doi.org/10.1186/s40497-019-0155-5

Ross ML (1999) The political economy of the resource curse. World Politics 51(2):297–322. https://doi.org/10.1017/S0043887100008200

Roundy PT, Bayer MA (2019) To bridge or buffer? A resource dependence theory of nascent entrepreneurial ecosystems. J Entrepreneurship Emerg Economies 11(4):550–575. https://doi.org/10.1108/JEEE-06-2018-0064

Ruel H, Rowlands H, Njoku E (2020) Digital business strategizing: The role of leadership and organizational learning. Competitiveness Rev: Int Bus J 31(1):145–161. https://doi.org/10.1108/CR-11-2019-0109

Savastano M, Zentner H, Spremić M, Cucari N (2022) Assessing the relationship between digital transformation and sustainable business excellence in a turbulent scenario. Tot Qual Manag Business Excellence 1–22. https://doi.org/10.1108/JMTM-07-2021-0267

Shibin K, Dubey R, Gunasekaran A, Luo Z, Papadopoulos T, Roubaud D (2018) Frugal innovation for supply chain sustainability in SMEs: Multi-method research design. Prod Plan Control 29(11):908–927

Soper K (2006) Conceptualizing needs in the context of consumer politics. J Consum Policy 29(4):355–372. https://doi.org/10.1007/s10603-006-9017-y

Sousa SR, de O, Melchior C, Da Silva WV, Zanini RR, Su Z, da Veiga CP (2021) Show you the money–firms investing in worker safety have better financial performance: Insights from a mapping review. Int J Workplace Health Manag 14(3):310–331. https://doi.org/10.1108/IJWHM-11-2020-0200

Stroumpoulis A, Kopanaki E (2022) Theoretical perspectives on sustainable supply chain management and digital transformation: A literature review and a conceptual framework. Sustainability 14(8):4862

Tabachnick BG, Fidell LS, Ullman JB (2013) Using multivariate statistics (Vol. 6). Pearson Boston, MA

Tambunan T (2019) Recent evidence of the development of micro, small and medium enterprises in Indonesia. J Glob Entrepreneurship Res 9(1):18. https://doi.org/10.1186/s40497-018-0140-4

Tarka P (2018) An overview of structural equation modeling: Its beginnings, historical development, usefulness and controversies in the social sciences. Qual Quant 52:313–354

Teece DJ (2012) Dynamic capabilities: Routines versus entrepreneurial action. J Manag Stud 49(8):1395–1401. https://doi.org/10.1111/j.1467-6486.2012.01080

Teece DJ, Teece DJ (2007) Explicating Dynamic Capabilities: The Nature and Microfoundations of (Sustainable) Enterprise Performance Stable URL: http://www.jstor.org/stable/20141992 AND MICROFOUNDATIONS OF (SUSTAINABLE) ENTERPRISE PERFORMANCE EXPLICATING DYNAMIC CAPABILITIES. 28(13):1319–1350. https://doi.org/10.1002/smj.64()Received

Tehseen S, Sajilan S (2016) Network competence based on resource-based view and resource dependence theory. Int J Trade Glob Mark 9(1):60–82. https://doi.org/10.1504/IJTGM.2016.074138

Thukral E (2021) COVID‐19: Small and medium enterprises challenges and responses with creativity, innovation, and entrepreneurship. Strategic Change 30(2):153–158. https://doi.org/10.1002/jsc.2399

Article   PubMed Central   Google Scholar  

Tian H, Dogbe CSK, Pomegbe WWK, Sarsah SA, Otoo COA (2021) Organizational learning ambidexterity and openness, as determinants of SMEs’ innovation performance. Eur J Innov Manag 24(2):414–438. https://doi.org/10.1108/EJIM-05-2019-0140

Todeschini BV, Cortimiglia MN, Callegaro-de- Menezes D, Ghezzi A (2017) Innovative and sustainable business models in the fashion industry: Entrepreneurial drivers, opportunities, and challenges. Bus Horiz 60(6):759–770

Urbach N, Ahlemann F, Böhmann T, Drews P, Brenner W, Schaudel F, Schütte R (2019) The impact of digitalization on the IT department. Bus Inf Syst Eng 61:123–131. https://doi.org/10.1007/s12599-018-0570-0

Vahlne J, Jonsson A (2016) Ambidexterity as a dynamic capability in the globalization of the multinational business enterprise (MBE): Case studies of AB Volvo and IKEA. Int Business Rev 2015. https://doi.org/10.1016/j.ibusrev.2016.05.006

Vesci M, Feola R, Parente R, Radjou N (2021) How to save the world during a pandemic event. A case study of frugal innovation. RD Manag 51(4):352–363

Villena VH, Revilla E, Choi TY (2011) The dark side of buyer– supplier relationships: A social capital perspective. J Oper Manag 29(6):561–576. https://doi.org/10.1016/j.jom.2010.09.001

Vuong QH, Napier NK (2015) International Journal of Intercultural Relations Acculturation and global mind sponge: An emerging market perspective. Int J Intercultural Relat 49:354–367. https://doi.org/10.1016/j.ijintrel.2015.06.003

Vuong QH, Napier NK (2014) Making creativity: the value of multiple filters in the innovation process Quan Hoang Vuong *. 3(4):294–327

Weaven S, Quach S, Thaichon P, Frazer L, Billot K, Grace D (2021) Surviving an economic downturn: Dynamic capabilities of SMEs. J Bus Res 128:109–123. https://doi.org/10.1016/j.jbusres.2021.02.009

Weidong S (2007) Gaining Economic Competitive Advantages in Poor Counties Based on Resource-based Theory. China Popul, Resour Environ 17(4):25–29. https://doi.org/10.1016/S1872-583X(08)60001-7

Welbourne TM, Pardo-del-Val M (2009) Relational capital: Strategic advantage for small and medium-size enterprises (SMEs) through negotiation and collaboration. Group Decis Negotiation 18:483–497. https://doi.org/10.1007/s10726-008-9138-6

Wernerfelt B (1995) The resource‐based view of the firm: Ten years after. Strat Manag J 16(3):171–174. https://doi.org/10.1002/smj.4250160303

Weyrauch T, Herstatt C (2017) What is frugal innovation? Three defining criteria. J Frugal Innov 2(1):1–17. https://doi.org/10.1186/s40669-016-0005-y

Widiatmaka FP, Sularno H, Prasetyo AN, Djari JA, Samodro LMAB, Munawar M, Listyorini H, Supriyanto S (2023) How Interaction Should Transform, Value be Developed to Drive Teamwork Performance? An Empirical Research in Merchant Marine Colleges: PORTUGUES. Int J Professional Bus Rev 8(4):e01044

Zhang T, Shi Z-Z, Shi Y-R, Chen N-J (2022) Enterprise digital transformation and production efficiency: Mechanism analysis and empirical research. Econ Res -Ekonomska Istraživanja 35(1):2781–2792. https://doi.org/10.1080/1331677X.2021.1980731

Zhang X (2018) Frugal innovation and the digital divide: Developing an extended model of the diffusion of innovations. Int J Innov Stud 2(2):53–64. https://doi.org/10.1016/j.ijis.2018.06.001

Zhang G, Thai V, Yuen K, Loh H, Zhou Q (2018) Addressing the epistemic uncertainty in maritime accidents modelling using Bayesian network with interval probabilities. Safety Science, Query date: 2023-03-25 12:09:02. https://www.sciencedirect.com/science/article/pii/S0925753516302685

Download references

Acknowledgements

We would like to thank the Humanities and Social Sciences Funding from the Ministry of Transportation of the Republic of Indonesia through Merchant Marine Polytechnic Semarang, for supporting this research and publication. We also would like to thank the individuals and organisations who generously shared their time and experience for this project.

Author information

Authors and affiliations.

Nautical, Technical, Port And Shipping, Merchant Marine Polytechnic, Semarang, 50242, Indonesia

Suherman, Florentinus Pambudi Widiatmaka, Fitri Kensiwi, Didik Dwi Suharso,  Sukirno &  Pranyoto

Management Department, Dehasen University, Bengkulu, 38228, Indonesia

Susena Karona Cahya

Technical Department, AMNI Maritime University, Semarang, 50246, Indonesia

Management And Tourism Department, Indonesia College Of Tourism Economic, Semarang, 50233, Indonesia

Haniek Listyorini, Sapto Supriyanto,  Pranoto &  Sukrisno

You can also search for this author in PubMed   Google Scholar

Contributions

Formal analysis: SHM, SKRSN, HNK, PRNT, PMBD. Resources: KNDR, SPT, KRN, DDK, SHM, SKRSN, PRNT, PMBD. Writing original draft: FTRK, PRNYT, SKRN, SHM, SKSRN, HNK, PRNT, PMBD. All authors discussed the results and contributed to the finalised manuscript.

Corresponding author

Correspondence to Sukrisno .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval

The Ethics Committee of Merchant Marine Polytechnic granted approval for this research. The study’s methodology adhered strictly to the principles outlined in the Declaration of Helsinki. Ethical clearance was sought and obtained on July 19, 2024, before any data collection activities began. This ensured that all research procedures were in compliance with established ethical guidelines.

Informed consent

Informed consent was obtained from all participants in the study, both for participation as well as publication.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Suherman, Widiatmaka, F.P., Kensiwi, F. et al. Resilience in tourism-based SMEs driven by initiatives and strategies through share value relational capital viewed from a resource-based theory perspective. Humanit Soc Sci Commun 11 , 1128 (2024). https://doi.org/10.1057/s41599-024-03607-z

Download citation

Received : 28 November 2023

Accepted : 12 August 2024

Published : 03 September 2024

DOI : https://doi.org/10.1057/s41599-024-03607-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

definition of regression analysis in research

IMAGES

  1. Regression Analysis

    definition of regression analysis in research

  2. PPT

    definition of regression analysis in research

  3. Regression analysis: What it means and how to interpret the outcome

    definition of regression analysis in research

  4. Regression: Definition, Analysis, Calculation, and Example

    definition of regression analysis in research

  5. PPT

    definition of regression analysis in research

  6. What is regression analysis?

    definition of regression analysis in research

VIDEO

  1. What is Regression Analysis?

  2. REGRESSION ANALYSIS IN ACADEMIC RESEARCH

  3. Regression##definition##Engineering mathematics##BE#BTech###fir all universities##anna university##

  4. REGRESSION ANALYSIS

  5. SPSS Tutorial: Mastering Simple Linear Regression for Data Analysis

  6. Fundamentals of Regression Analysis

COMMENTS

  1. Regression analysis

    First, regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning. Second, in some situations regression analysis can be used to infer causal relationships between the independent and dependent variables.

  2. Regression Analysis

    Regression analysis is a quantitative research method which is used when the study involves modelling and analysing several variables, where the relationship includes a dependent variable and one or more independent variables. In simple terms, regression analysis is a quantitative method used to test the nature of relationships between a ...

  3. Regression: Definition, Analysis, Calculation, and Example

    Regression is a statistical measurement that attempts to determine the strength of the relationship between one dependent variable and a series of other variables.

  4. Understanding Regression Analysis: Overview and Key Use

    Understanding regression analysis: overview and key uses. Regression analysis is a fundamental statistical method that helps us predict and understand how different factors (aka independent variables) influence a specific outcome (aka dependent variable). Imagine you're trying to predict the value of a house.

  5. Regression Analysis

    Regression Analysis Regression analysis is a set of statistical processes for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables when the focus is on the relationship between a dependent variable and one or more independent variables (or 'predictors').

  6. Explained: Regression analysis

    Regression analysis. It sounds like a part of Freudian psychology. In reality, a regression is a seemingly ubiquitous statistical tool appearing in legions of scientific papers, and regression analysis is a method of measuring the link between two or more phenomena.

  7. (PDF) Regression Analysis

    7.1 Introduction. Regression analysis is one of the most fr equently used tools in market resear ch. In its. simplest form, regression analys is allows market researchers to analyze rela tionships ...

  8. A Refresher on Regression Analysis

    A Refresher on Regression Analysis. Understanding one of the most important types of data analysis. by. Amy Gallo. November 04, 2015. uptonpark/iStock/Getty Images. You probably know by now that ...

  9. What is Regression Analysis?

    Regression analysis is a widely used set of statistical analysis methods for gauging the true impact of various factors on specific facets of a business. These methods help data analysts better understand relationships between variables, make predictions, and decipher intricate patterns within data. Regression analysis enables better predictions and more informed decision-making by tapping ...

  10. Regression Analysis

    Regression analysis is a set of statistical methods used to estimate relationships between a dependent variable and one or more independent variables.

  11. Regression Analysis: Definition, Types, Usage & Advantages

    Overall, regression analysis saves the survey researchers' additional efforts in arranging several independent variables in tables and testing or calculating their effect on a dependent variable. Different types of analytical research methods are widely used to evaluate new business ideas and make informed decisions.

  12. Regression Analysis

    Linear regression analysis is one of the most important statistical methods. It examines the linear relationship between a metric-scaled dependent variable (also called endogenous, explained, response, or predicted variable) and one or more metric-scaled independent variables (also called exogenous, explanatory, control, or predictor variable).

  13. Regression Analysis for Prediction: Understanding the Process

    Regression analysis is a statistical technique for determining the relationship between a single dependent (criterion) variable and one or more independent (predictor) variables. The analysis yields a predicted value for the criterion resulting from a linear combination of the predictors. According to Pedhazur, 15 regression analysis has 2 uses ...

  14. Regression Analysis

    Definition 7.1: Regression analysis is a statistical method for analyzing a relationship between two or more variables in such a manner that one of the variables can be predicted or explained by the information on the other variables.

  15. Sage Research Methods

    Understanding Regression Analysis: An Introductory Guide presents the fundamentals of regression analysis, from its meaning to uses, in a concise, easy-to-read, and non-technical style. It illustrates how regression coefficients are estimated, interpreted, and used in a variety of settings within the social sciences, business, law, and public ...

  16. Regression Analysis

    Regression analysis is a technique that permits one to study and measure the relation between two or more variables. Starting from data registered in a sample, regression analysis seeks to determine an estimate of a mathematical relation between two or more variables.

  17. What Is Regression Analysis? Types, Importance, and Benefits

    Regression analysis is a powerful tool used to derive statistical inferences for the future using observations from the past. It identifies the connections between variables occurring in a dataset and determines the magnitude of these associations and their significance on outcomes.

  18. Regression Analysis

    Definition. Regression analysis is a statistical method for investigating the relationships between variables, which includes a number of techniques for modeling and analyzing several variables. The focus is on the relationship between a dependent variable and one or more independent variables (Sen and Srivastava 1990).

  19. What is regression analysis? Definition and examples

    The definition and meaning of regression analysis, in statistical modelling, is a way of mathematically sorting out a series of variables to determine which ones have an impact and how they relate to one another.

  20. What is Regression Analysis? Definition, Types, and Examples

    In simple terms, regression analysis identifies the variables that have an impact on another variable. The regression model is primarily used in finance, investing, and other areas to determine the strength and character of the relationship between one dependent variable and a series of other variables.

  21. Regression Analysis: Step by Step Articles, Videos, Simple Definitions

    How to articles for regression analysis. Find a regression slope by hand or using technology like Excel or SPSS. Scatter plots, linear regression and more.

  22. What Is Regression Analysis in Business Analytics?

    Regression analysis is the statistical method used to determine the structure of a relationship between variables. Learn to use it to inform business decisions.

  23. Understanding and interpreting regression analysis

    Regression analysis is a powerful and useful statistical procedure with many implications for nursing research. It enables researchers to describe, predict and estimate the relationships and draw plausible conclusions about the interrelated variables in relation to any studied phenomena.

  24. Resilience in tourism-based SMEs driven by initiatives and ...

    The analysis uses logistic regression with dichotomous response and predictor variables on structured tables of count data, representing firm performance as a result of capital resources, physical ...