Part 1 - Foundations
The first course of the Google Data Analytics Certificate focuses on introducing the process of data analytics, talking about what data is and how analyzing data can provide insights that can help businesses make better data-driven decisions.
Data science - analysis - analytics
Data science is a way of creating new ways of modeling and understanding the unknown by using raw data. Data scientists use data to create new questions, while data analysts create insights from existing data to answer questions.
Data analytics is basically the science of data. It's a wide-ranging concept that includes everything from managing data to the methods and tools used by data scientists and other data workers.
The role of data in decision-making
You can use data to find facts that can help guide business strategy. A data analyst can get data, analyze it and use it to find patterns and relationships. To get the best results, it's important to include insights from people who are familiar with the business problem, ie. subject matter experts. These people can help find inconsistencies and guide you through gray areas, and validate the choices being made.
Key skills for a data analyst
A data analyst should possess analytical thinking skills, ie. solving problems using facts in an organized, step-by-step process. There are five key aspects to analytical thinking:
Visualization - the graphical representation of data
Strategy - having a strategic mindset is key to staying focused
Problem-orientation - data analysts use a problem-oriented approach to identify, describe, and solve problems
Correlation - keeping in mind that correlation doesn't equal causation
Big-picture and detail-oriented thinking - being able to see the big picture, as well as the small details
Core analytical skills
Data analysts often ask what the root cause of a problem is. The "five why's" can benefit the process of finding the root cause. It's a very helpful tool in data analysis. You simply ask "why" five times to find the real reason.
Another good method to know is gap analysis. It's a method for evaluating how a process currently works, in order to get to where you want to be in the future. Then you can identify the gaps that exist between the current and future state.
Data life cycle stages
The data life cycle provides a common framework for how data is managed:
Plan - decide what kind of data is needed, how it will be managed, and who will be responsible for it
Capture - collect or bring in data from a variety of different sources
Manage - care for and maintain the data. This includes determining how and where it is stored and the tools used to do so
Analyze - use the data to solve problems, make decisions, and support business goals
Archive - keep relevant data stored for long-term and future reference
Destroy - remove data from storage and delete any shared copies of the data
Don't mix up the six stages of the data life cycle with the six phases of the data analysis life cycle. They are not the same thing.
The six steps of data analysis
The data analysis process consists of six steps:
Ask questions and define the problem
Prepare data by collecting and storing the information
Process data by cleaning and checking the information
Analyze data to find trends, relationships, and patterns
Share data with your audience
Act on the data and use the results of the analysis
Ask
In this phase you define the problem to be solved, and make sure you fully understand expectations from stakeholders. The "five why's" method can be very helpful in this phase.
Prepare
In this phase, you collect and store data for the upcoming data analysis process.
Process
Here you find and eliminate errors and inconsistencies in the data. That means cleaning data, perhaps transforming it into a more useful format, combining datasets to make information more complete, and removing any outliers that could skew the information.
Analyze
In this phase, you use tools to organize and transform data in order to draw useful insights and drive informed decision-making. Data analysts use a lot of powerful tools in their work, eg. spreadsheets and SQL.
Share
Here you'll share the results of your work so stakeholders can make effective data-driven decisions. In this phase, you'll use visualization tools to make complex concepts and facts easier to understand.
Act
In the final phase, businesses will take all your insights and put them to work to solve the original business problem.
The data analysis toolbox
The most common tool used by data analysts include spreadsheets like Excel, structured query languages and databases, and visualization tools.
By putting the data in a spreadsheet you can see patterns, group and easily find the information you need. With SQL data analysts can access the data they need by making requests to a database. Finally, data analysts use visualization to represent information and better communicate their insights in a compelling way. Some data analysts also use programming languages such as Python or R for data analysis and visualization.
The importance of fairness
Data analysts need to ensure their analysis doesn't create or reinforce bias. And sometimes conclusions based on data can be both true and unfair, and as a data analyst it's your job to make sure your analysis is fair and factors in the social context that could create bias in your conclusions.