8 case studies and real world examples of how Big Data has helped keep on top of competition

8 case studies and real world examples of how Big Data has helped keep on top of competition

Fast, data-informed decision-making can drive business success. Managing high customer expectations, navigating marketing challenges, and global competition – many organizations look to data analytics and business intelligence for a competitive advantage.

Using data to serve up personalized ads based on browsing history, providing contextual KPI data access for all employees and centralizing data from across the business into one digital ecosystem so processes can be more thoroughly reviewed are all examples of business intelligence.

Organizations invest in data science because it promises to bring competitive advantages.

Data is transforming into an actionable asset, and new tools are using that reality to move the needle with ML. As a result, organizations are on the brink of mobilizing data to not only predict the future but also to increase the likelihood of certain outcomes through prescriptive analytics.

Here are some case studies that show some ways BI is making a difference for companies around the world:

1) Starbucks:

With 90 million transactions a week in 25,000 stores worldwide the coffee giant is in many ways on the cutting edge of using big data and artificial intelligence to help direct marketing, sales and business decisions

Through its popular loyalty card program and mobile application, Starbucks owns individual purchase data from millions of customers. Using this information and BI tools, the company predicts purchases and sends individual offers of what customers will likely prefer via their app and email. This system draws existing customers into its stores more frequently and increases sales volumes.

The same intel that helps Starbucks suggest new products to try also helps the company send personalized offers and discounts that go far beyond a special birthday discount. Additionally, a customized email goes out to any customer who hasn’t visited a Starbucks recently with enticing offers—built from that individual’s purchase history—to re-engage them.

2) Netflix:

The online entertainment company’s 148 million subscribers give it a massive BI advantage.

Netflix has digitized its interactions with its 151 million subscribers. It collects data from each of its users and with the help of data analytics understands the behavior of subscribers and their watching patterns. It then leverages that information to recommend movies and TV shows customized as per the subscriber’s choice and preferences.

As per Netflix, around 80% of the viewer’s activity is triggered by personalized algorithmic recommendations. Where Netflix gains an edge over its peers is that by collecting different data points, it creates detailed profiles of its subscribers which helps them engage with them better.

The recommendation system of Netflix contributes to more than 80% of the content streamed by its subscribers which has helped Netflix earn a whopping one billion via customer retention. Due to this reason, Netflix doesn’t have to invest too much on advertising and marketing their shows. They precisely know an estimate of the people who would be interested in watching a show.

3) Coca-Cola:

Coca Cola is the world’s largest beverage company, with over 500 soft drink brands sold in more than 200 countries. Given the size of its operations, Coca Cola generates a substantial amount of data across its value chain – including sourcing, production, distribution, sales and customer feedback which they can leverage to drive successful business decisions.

Coca Cola has been investing extensively in research and development, especially in AI, to better leverage the mountain of data it collects from customers all around the world. This initiative has helped them better understand consumer trends in terms of price, flavors, packaging, and consumer’ preference for healthier options in certain regions.

With 35 million Twitter followers and a whopping 105 million Facebook fans, Coca-Cola benefits from its social media data. Using AI-powered image-recognition technology, they can track when photographs of its drinks are posted online. This data, paired with the power of BI, gives the company important insights into who is drinking their beverages, where they are and why they mention the brand online. The information helps serve consumers more targeted advertising, which is four times more likely than a regular ad to result in a click.

Coca Cola is increasingly betting on BI, data analytics and AI to drive its strategic business decisions. From its innovative free style fountain machine to finding new ways to engage with customers, Coca Cola is well-equipped to remain at the top of the competition in the future. In a new digital world that is increasingly dynamic, with changing customer behavior, Coca Cola is relying on Big Data to gain and maintain their competitive advantage.

4) American Express GBT

The American Express Global Business Travel company, popularly known as Amex GBT, is an American multinational travel and meetings programs management corporation which operates in over 120 countries and has over 14,000 employees.

Challenges:

Scalability – Creating a single portal for around 945 separate data files from internal and customer systems using the current BI tool would require over 6 months to complete. The earlier tool was used for internal purposes and scaling the solution to such a large population while keeping the costs optimum was a major challenge

Performance – Their existing system had limitations shifting to Cloud. The amount of time and manual effort required was immense

Data Governance – Maintaining user data security and privacy was of utmost importance for Amex GBT

The company was looking to protect and increase its market share by differentiating its core services and was seeking a resource to manage and drive their online travel program capabilities forward. Amex GBT decided to make a strategic investment in creating smart analytics around their booking software.

The solution equipped users to view their travel ROI by categorizing it into three categories cost, time and value. Each category has individual KPIs that are measured to evaluate the performance of a travel plan.

Reducing travel expenses by 30%

Time to Value – Initially it took a week for new users to be on-boarded onto the platform. With Premier Insights that time had now been reduced to a single day and the process had become much simpler and more effective.

Savings on Spends – The product notifies users of any available booking offers that can help them save on their expenditure. It recommends users of possible saving potential such as flight timings, date of the booking, date of travel, etc.

Adoption – Ease of use of the product, quick scale-up, real-time implementation of reports, and interactive dashboards of Premier Insights increased the global online adoption for Amex GBT

5) Airline Solutions Company: BI Accelerates Business Insights

Airline Solutions provides booking tools, revenue management, web, and mobile itinerary tools, as well as other technology, for airlines, hotels and other companies in the travel industry.

Challenge: The travel industry is remarkably dynamic and fast paced. And the airline solution provider’s clients needed advanced tools that could provide real-time data on customer behavior and actions.

They developed an enterprise travel data warehouse (ETDW) to hold its enormous amounts of data. The executive dashboards provide near real-time insights in user-friendly environments with a 360-degree overview of business health, reservations, operational performance and ticketing.

Results: The scalable infrastructure, graphic user interface, data aggregation and ability to work collaboratively have led to more revenue and increased client satisfaction.

6) A specialty US Retail Provider: Leveraging prescriptive analytics

Challenge/Objective: A specialty US Retail provider wanted to modernize its data platform which could help the business make real-time decisions while also leveraging prescriptive analytics. They wanted to discover true value of data being generated from its multiple systems and understand the patterns (both known and unknown) of sales, operations, and omni-channel retail performance.

We helped build a modern data solution that consolidated their data in a data lake and data warehouse, making it easier to extract the value in real-time. We integrated our solution with their OMS, CRM, Google Analytics, Salesforce, and inventory management system. The data was modeled in such a way that it could be fed into Machine Learning algorithms; so that we can leverage this easily in the future.

The customer had visibility into their data from day 1, which is something they had been wanting for some time. In addition to this, they were able to build more reports, dashboards, and charts to understand and interpret the data. In some cases, they were able to get real-time visibility and analysis on instore purchases based on geography!

7) Logistics startup with an objective to become the “Uber of the Trucking Sector” with the help of data analytics

Challenge: A startup specializing in analyzing vehicle and/or driver performance by collecting data from sensors within the vehicle (a.k.a. vehicle telemetry) and Order patterns with an objective to become the “Uber of the Trucking Sector”

Solution: We developed a customized backend of the client’s trucking platform so that they could monetize empty return trips of transporters by creating a marketplace for them. The approach used a combination of AWS Data Lake, AWS microservices, machine learning and analytics.

  • Reduced fuel costs
  • Optimized Reloads
  • More accurate driver / truck schedule planning
  • Smarter Routing
  • Fewer empty return trips
  • Deeper analysis of driver patterns, breaks, routes, etc.

8) Challenge/Objective: A niche segment customer competing against market behemoths looking to become a “Niche Segment Leader”

Solution: We developed a customized analytics platform that can ingest CRM, OMS, Ecommerce, and Inventory data and produce real time and batch driven analytics and AI platform. The approach used a combination of AWS microservices, machine learning and analytics.

  • Reduce Customer Churn
  • Optimized Order Fulfillment
  • More accurate demand schedule planning
  • Improve Product Recommendation
  • Improved Last Mile Delivery

How can we help you harness the power of data?

At Systems Plus our BI and analytics specialists help you leverage data to understand trends and derive insights by streamlining the searching, merging, and querying of data. From improving your CX and employee performance to predicting new revenue streams, our BI and analytics expertise helps you make data-driven decisions for saving costs and taking your growth to the next level.

Most Popular Blogs

case study of big data analytics

Ready to transform and unlock your full IT potential? Connect with us today to learn more about our comprehensive digital solutions.

Schedule a Consultation

schedule-consultation

Transforming IT Operations with Managed Service Solutions for a Leading Retail Sports Giant

Delivering noc and soc it managed services for a leading global entertainment brand, elevating user transitions: jml automation mastery at work, saving hundreds of manual hours.

TE-ep6-banner

TechEnablers Episode 6: Navigating the Retail Revolutio

TE-ep5-banner

TechEnablers Episode 5: Upgrading the In-Store IT Infra

Webinar_CPO

Cyber Program Operations: What might be missing from yo

Podcast-ep17

Driving Efficiency in Retail Logistics

PD16-banner

Visualizing Data in Healthcare

Robin Sutara

Diving into Data and Diversity

case study of big data analytics

AWS Named as a Leader for the 11th Consecutive Year…

Introducing amazon route 53 application recovery controller, amazon sagemaker named as the outright leader in enterprise mlops….

  • Made To Order
  • Cloud Solutions
  • Salesforce Commerce Cloud
  • Distributed Agile
  • Consulting and Process Optimization
  • Data Warehouse & BI
  • ServiceNow Consulting and Implementation
  • Security Assessment & Mitigation
  • Case Studies
  • News and Events

Quick Links

Data and Analytics Case Study

Made possible by ey, exclusive global insights case study sponsor.

MIT Sloan Management Review Logo

GE’s Big Bet on Data and Analytics

Seeking opportunities in the internet of things, ge expands into industrial analytics., february 18, 2016, by: laura winig.

If software experts truly knew what Jeff Immelt and GE Digital were doing, there’s no other software company on the planet where they would rather be. –Bill Ruh, CEO of GE Digital and CDO for GE

In September 2015, multinational conglomerate General Electric (GE) launched an ad campaign featuring a recent college graduate, Owen, excitedly breaking the news to his parents and friends that he has just landed a computer programming job — with GE. Owen tries to tell them that he will be writing code to help machines communicate, but they’re puzzled; after all, GE isn’t exactly known for its software. In one ad, his friends feign excitement, while in another, his father implies Owen may not be macho enough to work at the storied industrial manufacturing company.

Owen's Hammer

Ge's ad campaign aimed at millennials emphasizes its new digital direction..

The campaign was designed to recruit Millennials to join GE as Industrial Internet developers and remind them — using GE’s new watchwords, “The digital company. That’s also an industrial company.” — of GE’s massive digital transformation effort. GE has bet big on the Industrial Internet — the convergence of industrial machines, data, and the Internet (also referred to as the Internet of Things) — committing $1 billion to put sensors on gas turbines, jet engines, and other machines; connect them to the cloud; and analyze the resulting flow of data to identify ways to improve machine productivity and reliability. “GE has made significant investment in the Industrial Internet,” says Matthias Heilmann, Chief Digital Officer of GE Oil & Gas Digital Solutions. “It signals this is real, this is our future.”

While many software companies like SAP, Oracle, and Microsoft have traditionally been focused on providing technology for the back office, GE is leading the development of a new breed of operational technology (OT) that literally sits on top of industrial machinery.

About the Author

Laura Winig is a contributing editor to MIT Sloan Management Review .

1. Predix is a trademark of General Electric Company.

2. M. LaWell, “Building the Industrial Internet With GE,” IndustryWeek, October 5, 2015.

3. D. Floyer, “Defining and Sizing the Industrial Internet,” June 27, 2013, http://wikibon.org.

i. S. Higginbotham, “BP Teams Up With GE to Make Its Oil Wells Smart,” Fortune, July 8, 2015.

More Like This

Add a comment cancel reply.

You must sign in to post a comment. First time here? Sign up for a free account : Comment on articles and get access to many more articles.

Comment (1)

tableau.com is not available in your region.

  • Data Center
  • Applications
  • Open Source

Logo

How Big Data is Used by Netflix, AccuWeather, China Eastern Airlines, Etsy, and mLogica: Business Case Studies

Shelby Hiter

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More .

A growing number of enterprises are pooling terabytes and petabytes of data, but many of them are grappling with ways to apply their big data as it grows. 

How can companies determine what big data solutions will work best for their industry, business model, and specific data science goals? 

Check out these big data enterprise case studies from some of the top big data companies and their clients to learn about the types of solutions that exist for big data management.

Enterprise case studies

Netflix on aws, accuweather on microsoft azure, china eastern airlines on oracle cloud, etsy on google cloud, mlogica on sap hana cloud.

Read next: Big Data Market Review 2021

Netflix is one of the largest media and technology enterprises in the world, with thousands of shows that its hosts for streaming as well as its growing media production division. Netflix stores billions of data sets in its systems related to audiovisual data, consumer metrics, and recommendation engines. The company required a solution that would allow it to store, manage, and optimize viewers’ data. As its studio has grown, Netflix also needed a platform that would enable quicker and more efficient collaboration on projects.

“Amazon Kinesis Streams processes multiple terabytes of log data each day. Yet, events show up in our analytics in seconds,” says John Bennett, senior software engineer at Netflix. 

“We can discover and respond to issues in real-time, ensuring high availability and a great customer experience.”

Industries: Entertainment, media streaming

Use cases: Computing power, storage scaling, database and analytics management, recommendation engines powered through AI/ML, video transcoding, cloud collaboration space for production, traffic flow processing, scaled email and communication capabilities

  • Now using over 100,000 server instances on AWS for different operational functions
  • Used AWS to build a studio in the cloud for content production that improves collaborative capabilities
  • Produced entire seasons of shows via the cloud during COVID-19 lockdowns
  • Scaled and optimized mass email capabilities with Amazon Simple Email Service (Amazon SES)
  • Netflix’s Amazon Kinesis Streams-based solution now processes billions of traffic flows daily

Read the full Netflix on AWS case study here .

AccuWeather is one of the oldest and most trusted providers of weather forecast data. The weather company provides an API that other companies can use to embed their weather content into their own systems. AccuWeather wanted to move its data processes to the cloud. However, the traditional GRIB 2 data format for weather data is not supported by most data management platforms. With Microsoft Azure, Azure Data Lake Storage, and Azure Databricks (AI), AccuWeather was able to find a solution that would convert the GRIB 2 data, analyze it in more depth than before, and store this data in a scalable way.

“With some types of severe weather forecasts, it can be a life-or-death scenario,” says Christopher Patti, CTO at AccuWeather. 

“With Azure, we’re agile enough to process and deliver severe weather warnings rapidly and offer customers more time to respond, which is important when seconds count and lives are on the line.”

Industries: Media, weather forecasting, professional services

Use cases: Making legacy and traditional data formats usable for AI-powered analysis, API migration to Azure, data lakes for storage, more precise reporting and scaling

  • GRIB 2 weather data made operational for AI-powered next-generation forecasting engine, via Azure Databricks
  • Delta lake storage layer helps to create data pipelines and more accessibility
  • Improved speed, accuracy, and localization of forecasts via machine learning
  • Real-time measurement of API key usage and performance
  • Ability to extract weather-related data from smart-city systems and self-driving vehicles

Read the full AccuWeather on Microsoft Azure case study here .

China Eastern Airlines is one of the largest airlines in the world that is working to improve safety, efficiency, and overall customer experience through big data analytics. With Oracle’s cloud setup and a large portfolio of analytics tools, it now has access to more in-flight, aircraft, and customer metrics.

“By processing and analyzing over 100 TB of complex daily flight data with Oracle Big Data Appliance, we gained the ability to easily identify and predict potential faults and enhanced flight safety,” says Wang Xuewu, head of China Eastern Airlines’ data lab.  

“The solution also helped to cut fuel consumption and increase customer experience.”

Industries: Airline, travel, transportation

Use cases: Increased flight safety and fuel efficiency, reduced operational costs, big data analytics

  • Optimized big data analysis to analyze flight angle, take-off speed, and landing speed, maximizing predictive analytics for engine and flight safety
  • Multi-dimensional analysis on over 60 attributes provides advanced metrics and recommendations to improve aircraft fuel use
  • Advanced spatial analytics on the travelers’ experience, with metrics covering in-flight cabin service, baggage, ground service, marketing, flight operation, website, and call center
  • Using Oracle Big Data Appliance to integrate Hadoop data from aircraft sensors, unifying and simplifying the process for evaluating device health across an aircraft
  • Central interface for daily management of real-time flight data

Read the full China Eastern Airlines on Oracle Cloud case study here .  

Etsy is an e-commerce site for independent artisan sellers. With its goal to create a buying and selling space that puts the individual first, Etsy wanted to advance its platform to the cloud to keep up with needed innovations. But it didn’t want to lose the personal touches or values that drew customers in the first place. Etsy chose Google for cloud migration and big data management for several primary reasons: Google’s advanced features that back scalability, its commitment to sustainability, and the collaborative spirit of the Google team.

Mike Fisher, CTO at Etsy, explains how Google’s problem-solving approach won them over. 

“We found that Google would come into meetings, pull their chairs up, meet us halfway, and say, ‘We don’t do that, but let’s figure out a way that we can do that for you.'”

Industries: Retail, E-commerce

Use cases: Data center migration to the cloud, accessing collaboration tools, leveraging machine learning (ML) and artificial intelligence (AI), sustainability efforts

  • 5.5 petabytes of data migrated from existing data center to Google Cloud
  • >50% savings in compute energy, minimizing total carbon footprint and energy usage
  • 42% reduced compute costs and improved cost predictability through virtual machine (VM), solid state drive (SSD), and storage optimizations
  • Democratization of cost data for Etsy engineers
  • 15% of Etsy engineers moved from system infrastructure management to customer experience, search, and recommendation optimization

Read the full Etsy on Google Cloud case study here .

mLogica is a technology and product consulting firm that wanted to move to the cloud, in order to better support its customers’ big data storage and analytics needs. Although it held on to its existing data analytics platform, CAP*M, mLogica relied on SAP HANA Cloud to move from on-premises infrastructure to a more scalable cloud structure.

“More and more of our clients are moving to the cloud, and our solutions need to keep pace with this trend,” says Michael Kane, VP of strategic alliances and marketing, mLogica 

“With CAP*M on SAP HANA Cloud, we can future-proof clients’ data setups.”

Industry: Professional services

Use cases: Manage growing pools of data from multiple client accounts, improve slow upload speeds for customers, move to the cloud to avoid maintenance of on-premises infrastructure, integrate the company’s existing big data analytics platform into the cloud

  • SAP HANA Cloud launched as the cloud platform for CAP*M, mLogica’s big data analytics tool, to improve scalability
  • Data analysis now enabled on a petabyte scale
  • Simplified database administration and eliminated additional hardware and maintenance needs
  • Increased control over total cost of ownership
  • Migrated existing customer data setups through SAP IQ into SAP HANA, without having to adjust those setups for a successful migration

Read the full mLogica on SAP HANA Cloud case study here .

Read next: Big Data Trends in 2021 and The Future of Big Data

Subscribe to Data Insider

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more.

Similar articles

Exploring multi-tenant architecture: a comprehensive guide, 8 best data analytics tools: gain data-driven advantage in 2024, common data visualization examples: transform numbers into narratives, get the free newsletter.

Subscribe to Data Insider for top news, trends & analysis

Latest Articles

Exploring multi-tenant architecture: a..., 8 best data analytics..., common data visualization examples:..., what is data management....

Logo

Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2024 (estimated) [ 3 ]

To get a glimpse of the amount of data that is generated on a daily basis, let’s see a portion of data that different platforms produce. On the Internet, there is so much information at our fingertips. We add to the stockpile everytime we look for answers from our search engines. As a results Google now produces more than 500,000 searches every second (approximately 3.5 billion search per day) [ 5 ]. By the time of writing this article, this number must have changed! Social media on the other hand is a massive data producer. 

People’s ‘love affair’ with social media certainly fuels data creation. Every minute, Snapchat users share 527,760 photos, more than 120 professionals join LinkedIn, users watch 4,146,6000 Youtube videos, 456,000 are sent to Twitter and Instagram users post 46,740 photos [ 5 ]. Facebook remains the largest social media platform, with over 300 million photos uploaded every day with more than 510,000 comments posted and 293,000 statuses updated every minute.

With the increase in the number and quantity of data, there have been advantages but also challenges as systems for managing relational databases and other traditional systems have difficulties in processing and analyzing this quantity. For this reason, the term ‘big data’ arose not only to describe the amount of data but also the need for new technologies and ways of processing and analyzing this data. Cloud Computing has facilitated data storage, processing and analysis. Using Cloud we have access to almost limitless storage and computer power offered by different vendors. Cloud delivery models such as: IAAS (Infrastructure as a Service), PAAS (Platform as a Service) can help organisations across different sectors handle Big Data easier and faster. The aim of this paper is to provide an overview of how analytics of Big Data in Cloud Computing can be done. For this we use Google’s platform BigQuery which is a serverless data warehouse with built-in machine learning capabilities. It’s very robust and has plenty of features to help with the analytics of different size and type of data.

What is big data?

Many authors and organizations have tried to provide a definition of ‘Big Data’. According to [ 6 ] “Big Data refers to data volumes in the range of exabytes and beyond”. In Wikipedia [ 7 ] big data is defined as an accumulation of datasets so huge and complex that it becomes hard to process using database management tools or traditional data processing applications, while the challenges include capture, storage, search, sharing, transfer, analysis, and visualization.

Sam Madden from Massachusetts Institute of Technology (MIT) considers” Big Data” to be data that is too big, too fast, or too hard for existing tools to process [ 8 ]. By too big, it means data that is at the petabyte level and that comes from various sources. By ‘too fast’ it means data growth which is fast and should also be processed quickly. By too hard it means the difficulty that arises as a result the data not adapting to the existing processing tools [ 9 ]. In PCMag (one of the most popular journals on technological trends), Big data refers to the massive amounts of data that is collected over time that are difficult to analyze and handle using common database management tools [ 10 ]. There are many other definitions for Big Data, but we consider that these are enough to gain an impression on this concept.

Features and characteristics of big data

One question that researchers have struggled to answer is what might qualify as ‘big data’? For this reason, in 2001 industry analyst Doug Laney from Gartner introduced the 3 V model which are three features that must complement the data to be considered” big data”: volume, velocity, variety . Volume is a property or characteristic that determines the size of data, usually reported in Terabyte or Petabyte. For example, social networks like Facebook store among others photos of users. Due to the large number of users, it is estimated that Facebook stores about 250 billion photos and over 2.5 trillion posts of its users. This is an extremely large amount of data that needs to be stored and processed. Volume is the most representative feature of ‘big data’ [ 8 ]. In terms of volume, tera or peta level data is usually considered ‘big’ although this depends on the capacity of those analyzing this data and the tools available to them [ 8 ]. Figure  2 shows what each of the three V's represent.

figure 2

3 V’s of Big Data [ 6 ]

The second property or characteristic is velocity . This refers to the degree to which data is generated or the speed at which this data must be processed and analyzed [ 8 ]. For example, Facebook users upload more than 900 million photos a day, which is approximately 104 uploaded photos per second. In this way, Facebook needs to process, store and retrieve this information to its users in real time. Figure  3 shows some statistics obtained from [ 11 ] which show the speed of data generation from different sources. As can be seen, social media and the Internet of Things (IoT) are the largest data generators, with a growing trend.

figure 3

Examples of the velocity of Big Data [ 9 ]

There are two main types of data processing: batch and stream. In batch, processing happens in blocks of data that have been stored over a period of time. Usually data processed in batch are big, so they will take longer to process. Hadoop MapReduce is considered to be the best framework for processing data in batches [ 11 ]. This approach works well in situations where there is no need for real-time analytics and where it is important to process large volumes of data to get more detailed insights.

Stream processing, on the other hand, is a key to the processing and analysis of data in real time. Stream processing allows for data processing as they arrive. This data is immediately fed into analytics tools so the results are generated instantly. There are many scenarios where such an approach can be useful such as fraud detection, where anomalies that signal fraud are detected in real time. Another use case would be online retailers, where real-time processing would enable them to compile large histories of costumer interactions so that additional purchases could be recommended for the costumers in real time [ 11 ].

The third property is variety , which refers to different types of data which are generated from different sources. “Big Data” is usually classified into three major categories: structured data (transactional data, spreadsheets, relational databases etc.), semi-structured (Extensible Markup Language - XML, web server logs etc) and unstructured (social media posts, audio, images, video etc.). In the literature, as a fourth category is also mentioned ‘meta-data’ which represents data about data. This is also shown in Fig.  4 . Most of the data today belong to the category of unstructured data (80%) [ 11 ].

figure 4

Main categories of data variety in Big Data [ 9 ]

Over time, the tree features of big data have been complemented by two additional ones: veracity and value . Veracity is equivalent to quality, which means data that are clean and accurate and that have something to offer [ 12 ]. The concept is also related to the reliability of data that is extracted (e.g., costumer sentiments in social media are not highly reliable data). Value of the data is related to the social or economic value data can generate. The degree of value data can produce depends also on the knowledge of those that make use of it.

Big data analytics in cloud computing

Cloud Computing is the delivery of computing services such as servers, storage, databases, networking, software, analytics etc., over the Internet (“the cloud”) with the aim of providing flexible resources, faster innovation and economies of scale [ 13 ]. Cloud computing has revolutionized the way computing infrastructure is abstracted and used. Cloud paradigms have been extended to include anything that can be considered as a service (hence x a service). The many benefits of cloud computing such as elasticity, pay-as-you-go or pay-per-use model, low upfront investment etc., have made it a viable and desirable choice for big data storage, management and analytics [ 13 ]. Because big data is now considered vital for many organizations and fields, service providers such as Amazon, Google and Microsoft are offering their own big data systems in a cost-efficient manner. These systems offer scalability for business of all sizes. This had led to the prominence of the term Analytics as a Service (AaaS) as a faster and efficient way to integrate, transform and visualize different types of data. Data Analytics.

Big data analytics cycle

According to [ 14 ] processing big data for analytics differs from processing traditional transactional data. In traditional environments, data is first explored then a model design as well as a database structure is created. Figure  5 . depicts the flow of big data analysis. As can be seen, it starts by gathering data from multiple sources, such as multiple files, systems, sensors and the Web. This data is then stored in the so called” landing zone” which is a medium capable of handling the volume, variety and velocity of data. This is usually a distributed file system. After data is stored, different transformations occur in this data to preserve its efficiency and scalability. Afer that, they are integrated into particular analytical tasks, operational reporting, databases or raw data extracts [ 14 ].

figure 5

Flow in the processing of Big Data [ 11 ]

Moving from ETL to ELT paradigm

ETL (Extract, Transform, Load) is about taking data from a data source, applying the transformations that might be required and then load it into a data warehouse to run reports and queries against them. The downside of this approach or paradigm is that is characterized by a lot of I/O activity, a lot of string processing, variable transformation and a lot of data parsing [ 15 ].

ELT (Extract, Load, Transform) is about taking the most compute-intensive activity (transformation) and doing it not in an on-premise service which is already under pressure with regular transaction-handling but instead taking it to the cloud [ 15 ]. This means that there is no need for data staging because data warehousing solution is used for different types.

of data including those that are structured, semi-structured, unstructured and raw. This approach employs the concept of” data lakes” that are different from OLAP (Online Analytical Processing) data warehouses because they do not require the transformation of data before loading them [ 15 ]. Figure 6 illustrates the differences between the two paradigms. As seen, the main difference is where transformation process takes place.

figure 6

Differences between ETL and ELT [ 15 ]

ELT has many benefits over traditional ETL paradigm. The most crucial, as mentioned, is the fact that data of any format can be ingested as soon as it becomes available. Another one is the fact that only the data required for particular analysis can be transformed. In ETL, the entire pipeline and structure of the data in the OLAP may require modification if the previous structure does not allow for new types of analysis [ 16 ].

Some advantages of big data analytics

As mentioned, companies across various sectors in the industry are leveraging Big Data in order to promote decision making that is data-driven. Besides tech industry, the usage and popularity of Big Data has expanded to include healthcare, governance, retail, supply chain management, education etc. Some of the benefits of Big Data Analytics mentioned in [ 17 ] include:

Data accumulation from different sources including the Internet, online shopping sites, social media, databases, external third-party sources etc.

Identification of crucial points that are hidden within large datasets in order to influence business decisions.

Identification of the issues regarding systems and business processes in real time.

Facilitation of service/product delivery to meet or exceed client expecations.

Responding to customer requests, queries and grievances in real time.

Some other benefits according to [ 16 ] are related to:

Cost optimization - One of the biggest advantages of Big Data tools such as Hadoop or Spark is that they offer cost advantages to businesses regarding the storage, processing and analysis of large amounts of data. Authors mention the logistics industry as an example to highlight the cost-reduction benefits of Big Data. In this industry, the cost of product returns is 1.5 times higher than that of actual shipping costs. With Big Data Analytics, companies can minimize product return costs by predicting the likelihood of product returns. By doing so, they can then estimate which products are most likely to be returned and thus enable the companies to take suitable measures to reduce losses on returns.

Efficiency improvements - Big Data can improve operational efficiency by a margin. Big Data tools can amass large amounts of useful costumer data by interacting and gaining their feedback. This data can then be analyzed and interpreted to extract some meaningful patterns hidden within such as customer taste and preferences, buying behaviors etc. This in turn allows companies to create personalized or tailored products/services.

Innovation - Insights from Big Data can be used to tweak business strategies, develop new products/services, optimize service delivery, improve productivity etc. These can all lead to more innovation.

As seen, Big Data Analytics has been mostly leveraged by businesses, but other sectors have also benefited. For example, in healthcare many states are now utilizing the power of Big Data to predict and also prevent epidemics, cure diseases, cut down costs etc. This data has also been used to establish many efficient treatment models. With Big Data more comprehensive reports were generated and these were then converted into relevant critical insights to provide better care [ 17 ].

In education, Big Data has also been used extensively. They have enabled teachers to measure, monitor and respond in real-time to student’s understanding of the material. Professors have created tailor-made materials for students with different knowledge levels to increase their interest [ 18 ].

Case study: GOOGLE’S big query for data processing and analytics

Google Cloud Platform contains a number of services designed to analyze and process big data. Throughout this paper we have described and discussed the architecture and main components of Biguery as one of the most used big data processing tools in GCP. BigQuery is a fully-managed, serverless data warehouse that enables scalable analysis over petabytes of data. It is a Platform as a Service (PaaS) that supports querying using ANSI SQL. It also has built-in machine learning capabilities. Since its launch in 2011 it has gained a lot of popularity and many big companies have utilized it for their data analytics [ 19 ].

From a user perspective, BigQuery has an intuitive user interface which can be accessed in a number of ways depending on user needs. The simplest way to interact with this tool is to use its graphical web interface as shown in Fig.  7 . Slightly more complicated but faster approaches include using cloud console or Bigquery APIs. From Fig. 7 Bigquery web interface offers you the options to add or select existing datasets, schedule and construct queries or transfer data and display results.

figure 7

BigQuery Interface

Data processing and query construction occurs under the sql workspace section, Bigquery offers a rich sql-like syntax to compute and process large sets of data, it operates on relational datasets with well-defined structure including tables with specified columns and types. Figure  8 shows a simple query construction syntax and highlights its execution details. Data displayed under query results shows main performance components of the executed query starting from elapsed time, consumed slot time, size of data processed, average and maximum wait, write and compute times. Query defined in Fig.  8 combines three datasets which contain information regarding Covid-19 reported cases, deaths and recoveries from more than 190 countries through year 2020 till January 2021. Google BigQuery is flexible in a way that allows you to use and combine various datasets suitable for your task easily and with small delays. It contains an ever growing list of public datasets at your disposal and also offers the options to create, edit and import your own. Figure  9 shows the process of adding a table to the newly created dataset. From the Fig.  9 , we see that for table creation as a source we have used a local csv file, this file will be used to create table schema and populate it with data, aside from local upload option as a source to create the table we can use Google BigTable, Google Cloud Storage or Google Drive. The newly created table with its respective data then is ready to be used to construct queries and obtain new insights as shown in Fig. 8 .

figure 8

BigQuery execution details

figure 9

Adding table to the created dataset

One advantage of using imported data in the cloud is the option to manage its access and visibility in the cloud project and cloud members scope. Depending from the way of use, queried data can be saved directly to the local computer through the use of “save results” option from Fig. 8 which offers a variety of formats and data extensions settings to choose from but can also be explored in different configurations using “explore data” option. You can also save constructed queries for later use or schedule query execution interval for more accurate data transmutation through API endpoints. Figure 10 shows how much the average compute time will change/increase with the increase in the size of the dataset used.

figure 10

Average compute time dependence in dataset size

Experiments with different dataset sizes

Before moving to data exploration lets analyze performance results of BigQuery in simple queries with variable dataset sizes. In Table  1 we have shown the query execution details of five simple select queries done on five different datasets. The results are displayed against six different performance categories, from the data we see a correlation between size of the dataset and its average read, write and compute.

From the graph we see that the dependence between dataset size and average compute size is exponential, meaning that with the increase in data size, average compute time is exponentially increased.

Data returned from constructed queries aside from being displayed in a simple tabular form or as a JSON object can also be transferred to data studio which is an integrated tool to better display and visualize gathered information. One way of displaying queried data from Fig. 8 with data studio tool is shown in Fig.  11 . In this case a bar table chart visualization option is chosen.

figure 11

Using data studio for data visualization

Big Data is not a new term but has gained its spotlight due to the huge amounts of data that are produced daily from different sources. From our analysis we saw that big data is increasing in a fast pace, leading to benefits but also challenges. Cloud Computing is considered to be the best solution for storing, processing and analyzing Big Data. Companies like Amazon, Google and Microsoft offer their public services to facilitate the process of dealing with Big Data. From the analysis we saw that there are multiple benefits that Big Data analytics provides for many different fields and sectors such as healthcare, education and business. We also saw that because of the interaction of Big Data with Cloud Computing there is a shift in the way data is processed and analyzed. In traditional settings, ETL is used whereas in Big Data, ELT is used. We saw that the latter has clear advantages when compared to the former.

From our case study we saw that BigQuery is very good for running complex analytical queries, which means there is no point in running queries that are doing simple aggregation or filtering. BigQuery is suitable for heavy queries, those that operate using a big set of data. The bigger the dataset, the more it is likely to gain in performance. This is when compared to the traditional relational databases,as BigQuery implements different parallel schemas to speed up the execution time.

BigQuery doesn’t like joins and merging data into one table gets a better execution time. It is good for scenarios where data does not change often as it has built-in cache. BigQuery can also be used when one wants to reduce the load on the relational database as it offers different options and configurations to improve query performance. Also pay as you go service can be used where charges are made based on usage or flat rate service which offers a specific slot rate and charges in daily, monthly or yearly plan.

Availability of data and materials

The datasets used during the current study are available from the corresponding author on reasonable request. The authors declare that they have no funder.

Hillbert M, Lopez P (2011) The world’s technological capacity to store, communicate and compute information. Science III:62–65

Google Scholar  

J. Hellerstein,“ Gigaom Blog,”2019. Available: https://gigaom.com/2008/11/09/mapreduce-leads-the-way-for-parallelprogramming/ . Accessed 20 Jan 2021

Statista,“Statista,“2020. Available: https://www.statista.com/statistics/871513/worldwide-data-created/ . Accessed 21 Jan 2021

Reinsel D, Gantz J, Rydning J (2017) Data age 2025: the evolution of data to-life critical. International Data Corporation, Framingham

Forbes, “Forbes”, 2020. Available: https://www.forbes.com/sites/bernardmarr/2018/05/21/how-muchdata-do-we-create-every-day-the-mind-blowing-stats-everyone-shouldread/?sh=5936b00460ba

Kaisler S, Armour F, Espinosa J (2013) Big data: issues and challenges moving forward, Wailea, Maui, HI, s.n, pp 995–1004

Wikipedia,“ Wikipedia,” 2018. Available: https://www.en.wikipedia.org/wiki/Bigdata/ . Accessed 4 Jan 2021

D. Gewirtz,“ ZDNet,” 2018. Available: https://www.zdnet.com/article/volume-velocity-and-varietyunderstanding-the-three-vs-of-big-data/ . Accessed 1 Jan 2021

Weathington J (2012) Big Data Defined. Tech Republic.  https://www.techrepublic.com/article/big-data-defined/

PCMagazine,“ PC Magazine,” 2018. Available: http://www.pcmag.com/encyclopedia/term/62849/big-data . Accessed 9 Jan 2021

Akhtar SMF (2018) Big Data Architect’s Handbook, Packt

WhishWorks, “WhishWorks”, 2019. Available: https://www.whishworks.com/blog/data-analytics/understanding-the3-vs-of-big-data-volume-velocity-and-variety/ . Accessed 23 Jan 2021

Yadav S, Sohal A (2017) Review paper on big data analytics in Cloud computing. Int J Comp Trends Technol (IJCTT) IX. 49(3);156-160

Kimball R, Ross M (2013) The data warehouse toolkit: the definitive guide to dimensional modeling, 3rd edn. John Wiley & Sons

LaprinthX, “LaprinthX,”2018. Available: https://laptrinhx.com/better-faster-smarter-elt-vs-etl-2084402419/ . Accessed 22 Jan 2021

Xplenty, “XPlenty, ”, 2019. Available: https://www.xplenty.com/blog/etl-vs-elt/# . Accessed 20 Jan 2021

Forbes,“Forbes,”,2018. Available: https://www.forbes.com/sites/forbestechcouncil/2019/11/06/fivebenefits-of-big-data-analytics-and-how-companies-can-getstarted/?sh=7e1b901417e4 . Accessed 13 Jan 202

EDHEC, “EDHEC, ”, 2019. Available: https://master.edhec.edu/news/three-ways-educators-are-using-bigdata-analytics-improve-learning-process# . Accessed 6 Jan 2021

Google Cloud, “BigQuery, ”, 2020. Available: https://cloud.google.com/bigquery . Accessed 5 Jan 2021

Download references

Acknowledgements

The authors would like to thank the colleageous and professors from the University of Prishtina for their insightful comments and suggestions that helped in improving the quality of the paper.

The authors declare that they have no funder.

Author information

Authors and affiliations.

Faculty of Electrical and Computer Engineering, Department of Computer Engineering, University of Prishtina, 10000, Prishtina, Kosovo

Blend Berisha, Endrit Mëziu & Isak Shabani

You can also search for this author in PubMed   Google Scholar

Contributions

Blend Berisha wrote the Introduction, Features and characteristics of Big Data and Conclusions. Endrit Meziu wrote Big Data¨ Analytics in Cloud Computing and part of the case study. Isak Shabani has contributed in the methodology, resources and in supervising the work process. All authors prepared the figures and also reviewed the manuscript. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Isak Shabani .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Berisha, B., Mëziu, E. & Shabani, I. Big data analytics in Cloud computing: an overview. J Cloud Comp 11 , 24 (2022). https://doi.org/10.1186/s13677-022-00301-w

Download citation

Received : 08 April 2022

Accepted : 24 July 2022

Published : 06 August 2022

DOI : https://doi.org/10.1186/s13677-022-00301-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Cloud computing

case study of big data analytics

COMMENTS

  1. 8 case studies and real world examples of how Big Data has ...

    8 case studies and real world examples of how Big Data has helped keep on top of competition. Fast, data-informed decision-making can drive business success. Managing high customer expectations, navigating marketing challenges, and global competition – many organizations look to data analytics and business intelligence for a competitive ...

  2. 5 Big Data Analytics Case Studies You Should Know

    In this article, we explore the top five big data analytics case studies that showcase its impact. These examples illustrate the real-world use of data by organizations, from solving real world problems, to optimizing customer experiences, to optimizing operations.

  3. Top 10 Big Data Case Studies that You Should Know

    Top 10 Big Data Case Studies 1. Big data in Netflix. Netflix implements data analytics models to discover customer behavior and buying patterns. Then, using this information it recommends movies and TV shows to their customers. That is, it analyzes the customer’s choice and preferences and suggests shows and movies accordingly.

  4. Netflix Recommender System — A Big Data Case Study

    Variety: Netflix says it collects most of the data in a structured format such as time of the day, duration of watch, popularity, social data, search-related information, stream related data, etc. However, Netflix could also be using unstructured data.

  5. 5 Big Data Case Studies – How big companies use Big Data

    Check out these amazing 5 big data case studies. How Big Data is helping all big cmpanies to make profit - Walmart, Netflix, Uber, eBay, P&G.

  6. GE’s Big Bet on Data and Analytics - MIT Sloan Management ...

    GE executives say the economics of amassing, storing, and running analytics on large lakes of data — pools of customer data that combine maintenance and repair data with time-series performance information — have dropped dramatically in the last 10 years, making the market viable.

  7. Big Data Analytics on GCP: Case Studies and Success Stories

    Conclusion. To get started with Big Data Analytics on GCP: What is GCP? Google Cloud Platform (GCP) is a suite of cloud computing services provided by Google. It offers a wide range of tools and technologies for businesses to leverage the power of big data analytics.

  8. Big Data Examples & Use Cases in Action - Tableau

    Learn about big data in action, when companies should use it, and how a wide variety of industries are using it. Get real big data use cases and examples.

  9. Companies Using Big Data | Big Data Case Studies - Datamation

    How can companies determine what big data solutions will work best for their industry, business model, and specific data science goals? Check out these big data enterprise case studies from some of the top big data companies and their clients to learn about the types of solutions that exist for big data management. Enterprise case studies

  10. Big data analytics in Cloud computing: an overview

    As a case study we analyze Googles BigQuery which is a fully-managed, serverless data warehouse that enables scalable analysis over petabytes of data. As a Platform as a Service (PaaS) supports querying using ANSI SQL.