上一章

下一章

更多图书

 

Analyzing Data with Power BI

 

Introduction to Power BI

 

 

By Anthony S. Williams


Please Check Out My Other Books Before You Continue

 

Below you will find my other popular books that are popular on Amazon and Kindle as well. Simply click on the link below to check them out.

 

D:\Dropbox\Dropbox\Self_Publishing\myBooks\9_Deep_Learning_with_Keras\cover\ebookme (4).jpg

Deep Learning with Keras

 

D:\Dropbox\Dropbox\Self_Publishing\myBooks\14_Convolutional_Neural_Networks_in_Python\cover\ebookme_rev (1).jpg

Convolutional Neural Networks in Python

 

D:\Dropbox\Dropbox\Self_Publishing\myBooks\4_Data_Analytics_for_Beginners\cover\front (2).jpg

Data Analytics for Beginners

If the links do not work, for whatever reason, you can simply search for these titles on the Amazon website to find the book.


© Copyright 2017 by Anthony S. Williams - All rights reserved.

Respective authors own all copyrights not held by the publisher.

The following publication is reproduced below with the goal of providing information that is as accurate and reliable as possible. Regardless, purchasing this publication can be seen as consent to the fact that both the publisher and the author of this book are in no way experts on the topics discussed within and that any recommendations or suggestions that are made herein are for informational purposes only. Professionals should be consulted as needed prior to undertaking any of the action endorsed herein.

This declaration is deemed fair and valid by both the American Bar Association and the Committee of Publishers Association and is legally binding throughout the United States.

Furthermore, the transmission, duplication or reproduction of any of the following work including specific information will be considered an illegal act irrespective of if it is done electronically or in print. This extends to creating a secondary or tertiary copy of the work or a recorded copy and is only allowed with express written consent from the Publisher. All additional right reserved.

The information in the following pages is broadly considered to be a truthful and accurate account of facts and as such any inattention, use or misuse of the information in question by the reader will render any resulting actions solely under their purview. There are no scenarios in which the publisher or the original author of this work can be in any fashion deemed liable for any hardship or damages that may befall them after undertaking information described herein.

Additionally, the information in the following pages is intended only for informational purposes and should thus be thought of as universal. As befitting its nature, it is presented without assurance regarding its prolonged validity or interim quality. Trademarks that are mentioned are done without written consent and can in no way be considered an endorsement from the trademark holder.

The trademarks that are used are without any consent, and the publication of the trademark is without permission or backing by the trademark owner. All trademarks and brands within this book are for clarifying purposes only and are the owned by the owners themselves, not affiliated with this document.


Table of Contents

Introduction

Data Analysis Methods

Chapter 1 Data Analytics Process

Quantitative Data Analytics

Methods Used for Analyzing Quantitative Data

Data Analysis Tasks

Chapter 2 Fundamentals of Data Modeling

The Main Aim of Data Models

Types of Data Models

Database Models

Chapter 3 Getting Started with Power BI

Power BI Service

Chapter 4 Analyzing and Visualizing Data with Power BI

Connecting Data Sources

Data Transformation and Data Cleaning

Modeling Your Data

Chapter 5 Applications of Data Analysis

Big Data Applications in Real Life

Conclusion


Introduction

 

The process of evaluating various data using logical and analytical reasoning in order to examine every single component of the provided data. This particular analysis is just one step that must be completed while conducting any research project. The first step is to gather various sources and then to analyze it forming some conclusion or finding. There are different kinds of specific data analysis methods including text analytics, data mining, data visualizations and business intelligence.

 

 

Data analysis is also known as data analytics, and the most common definition is that data analytics is in the first place process of modeling, transforming, cleansing data with one goal. The primary aim of any data analytics is to discover relevant and useful information and to come to a single conclusion and to support any decision-making. When it comes to the approaches used in data analytics, they can vary and some techniques can be used for certain data and others can't, so the method used in data analytics will depend on the data type in the first place. Data analytics and any processes of data analytics have various facets, and a wide range of techniques is used, and these techniques have different names as well in various domains such as business science and social science.

 

Data Analysis Methods

 

Like we mentioned at the beginning, there are different methods for delivering any data analysis conclusion, and one of the is data mining. Data mining is a certain data analytics method which focuses mostly on modeling and various knowledge discovery to come to certain predictions rather than being used just for descriptive purposes. Data mining is in fact computing process used in discovering large data collections and their patterns. Data mining is eating the intersection of data systems, machine learning, and statistics. Therefore, we can conclude that data mining is, in fact, a significant subfield of computer science.

 

Data Mining

 

When it comes to the goals of data mining, the overall goal of every data mining process is discover valuable information and extract them from a data collection already provided. Then this discovered information is being transformed into an easily understandable form for further use. Besides raw analysis in the core of data mining, it also involves data management, data processing models, interestingness metrics, inference considerations and complexity considerations.

 

Also for data mining processing of discovered form and structures and their visualization is key towards successful data analytics. It is correct to say that data mining is a great step in data analytics as the knowledge discovery in data collections.

 

Data Mining

Artificial Intelligence

Statistics

Machine Learning

 

So the ultimate goal of data mining is to gather profound knowledge of various provided data and to extract patterns of large collections of provided data. We also can say that data mining is buzzword which is often applied to various forms of large-scale data and information collection processing, warehousing, extraction, and statistics.

 

Any application of support systems in computer technology, such as artificial intelligence, business intelligence, and machine learning need data mining techniques and methods. For data mining, various terms are used as large-scale data analytics, but when it comes to the actual data mining techniques, terms like machine learning and artificial intelligence are more appropriate.

 

The data mining process is actually a semi-automatic analysis of a large collection of data in order to extract previously unfamiliar and unknown, distinct patterns of groups of data records or known as cluster analysis, various anomaly detection or unusual records and sequential patterns or dependencies. This usually requires certain techniques like spatial indices.

 

This pattern which is obtained may be seen as a type of summary of the data provided and may be used further in analyzing data. For example, it is very common in predictive analytics used in machine learning. Further, data mining process may identify numerous groups in the data provided, that can be used to gather more precise and accurate probability results by another technique decision support system.

 

It should be noted that neither the data preparation, results of interpretations and data collection is a part of the data mining process, but they most certainly belong to the overall process of data analytics as additional steps towards a conclusion.

 

Terms such as data fishing, data dredging, and data snooping are often used to describe certain data mining methods which involve large collections of data which are often too small for reliable and adequate statistical inferences which would be made in order to discover the validity of certain gathered patterns. However, these methods of data are very often used in designing new hypothesis which is used to test against the larger data collections.

 

When it comes to the statistical applications of data analytics, it can divide into, descriptive statistics, confirmatory data analysis, and exploratory data analytics. Exploratory data analytics is focused on discovering new characteristics in the confirmatory data analytics, or counterfeiting already created a hypothesis.

 

Also focused on applications of statistical models is predictive analytics often used in predictive forecasting and classification. On the other hand, text analytics is applied to linguistic, statistical and structural techniques in order to classify and extract various information from any textual source and to obtain a species of data which is unstructured.

 

Data analytics is closely connected to data visualization. Data integration and data dissemination. Therefore, term data analytics is often used to describe any data modeling. Text analytics term is also known as text mining is the process of gathering high-quality various information from any text source. This high-quality information gathered from the text are typically derived through the devising of various trends and patterns which correspond to different terms including statistical pattern learning.

 

Text Mining

 

Text analytics often involves the process of creating the input text and restructuring. Parsing is very often used which means that along with some derived linguistic characteristics, and in the same time with the removal of some of them, certain subsequent features are inserted into a collection of a database.

Text analytics gather these structures linguistic models and derive certain patterns which are withing the already structured data. The final step is data evaluation and interpretation of the result or output.

 

Text Mining

Computational Linguistics

Inferential Statistics

Machine Learning

 

High-quality in text mining means that certain combinations are involved such as novelty, relevance, and interestingness. Text analytics processes often include methods such as text clustering, text categorization, document summarization, entity or concept extraction, production of various granular taxonomies, learning connection between named entities and sentiment analysis.

 

Text mining also involves various studies like lexical analysis, information retrieval, information extraction, annotation, pattern recognition which all help to study the frequency distributions of words in any text source.

 

The overall goal of any text analysis is to turn text source into structured data which can be further analyzed via different applications including natural language processing and other analytical methods. Text mining is also involved in predictive analytics and data visualization.

So text mining has a predictive purpose, and its typical application is to gather and scan a collection of documents which are written in natural language and to further model the gathered document which is set to undergo some predictive classification.

 

Data Visualization

 

Data visualization is another data analytics method often referred to as equivalent to visual communication. Data visualization involves the designing and study of various visual representations of data gathered. In other words, data visualization means that information that has been gathered is further abstracted in schematic form which includes variables and attributes for each unit of information.

 

The main goal of data visualization is to gather information and communicate efficiently and clearly via various plots, statistical graphs, and information graphics. In data visualization, numerical data also can be encoded using lines, bars, and dots in order to communicate visually any quantitative message. Effective and clear data visualization also helps users to analyze any evidence and data and even more complex data collections are very accessible, usable and easily understandable.

 

Data Visualization

Function

Integrity

Interestingness

Form

 

Often users encounter on certain analytical tasks like understanding causality or they have to make certain comparisons or to create various principles of the graphic which often follows the specified task. These graphs may include showing various comparisons or showing causality between structured models. Used for these tasks are tables where a user can look up certain measurements, and charts of various kinds are used to represent relations and patterns in the data collection for single or multiple variables.

 

It should be noted that data visualization is at the same time both science and are, often referred to like a certain branch of statistical description. Data visualization is also grounded theory of developing various data analytic tools as well. Data created by various Internet activity is rapidly increasing, and an expanding number in the environment of various sensors is referred to Internet of things or big data also used by data visualization.

 

Communicating, processing and analyzing these large collections of data represent analytical and ethical challenges for this branch of data analytics. Often data scientist is called to help out in solving these challenges, and the overall science is referred to as data science.

 

Business Intelligence

 

Business Intelligence is a significant part of data analytics often referred to as simply BI. Business intelligence comprises the collection of processes, strategies, structured data, various applications and technical and technologies structured models that are used by enterprises in order to support the datasets, analysis,  presentation and dissemination of any business information.

 

Business intelligence has a great impact and significant role in current and predictive views of numerous business operations which are performed both today as well as in the past.

 

The most common goals and functions of business intelligence include online analytical processing, reporting, process mining and data mining, benchmarking, business performance managing, prescriptive analysis, text mining, complex event processing and predictive analysis.

 

Methods and technologies which are involved in business intelligence can handle a large collection of both structured and unstructured data in order to help identify, create and develop new strategies and new business opportunities.

 

The main goal of business intelligence involved in data analytics is to create an easy interpretation of large data collections. Business intelligence can provide businesses with very competitive market advantage by identifying better opportunities and by implementing effective techniques and strategies which are based on previous insights. Effective strategies within a very competitive market also provide long-term stability.

 

 

Business intelligence as a very efficient method of business related data analytics is commonly used by enterprises in order to support plenty of decisions related to business which ranges from strategic to operational. Most common operating decision include product pricing and product positioning on the market.

 

Strategic decisions related to business involve various priorities, aims, and directions regarding the broadest level. In most cases, business intelligence is most effective when it comes to the combining data that is derived from the current market in which a firm operates. These data used by a company is external data which is operated within internal sources of a company to the various business related sourced like internal data or operations data and financial data.

To get more proper and clear overall picture inner and external data are combined, and by combining these two sources, intelligence is created which cannot be gathered by any singular collection of data.

 

Business intelligence is also widely used a tool by empowering organizations to gain a better insight into current and new markets, to assess suitability and demand of services and products for different market fields and to measure the impact of various market efforts.

 

Business Intelligence applications are often used to provide data which is gathered from a large data warehouse of from a data mart, and these two concepts are commonly combined as BIDW. A large data warehouse is used to facilitate decision support since they contain copies of analytical data.

 

Business Intelligence

Data Sources

Business Solutions

Better understanding of business

Reducing risks of bottlenecks

Identifying waste in the system

Improving decision making process

Easy sharing and accessing information

Real-time data analysis

 


Chapter 1 Data Analytics Process

 

Data analysis is breaking a whole into its components which will be further examined separately. Data analytics is, in fact, a process for obtaining necessary data components and transforming it into various information which is useful and significant for any decision-making. In other words, any data is analyzed with the main aim which is to answer important questions, to test a certain hypothesis or to disprove theories.

 

Data analytics is defined as a process for analyzing large datasets, methods for interpreting obtained results of such procedures, particular ways of planning the gathering of data to make its analytics more precise, easier and more accurate. All the results, solutions and machinery of mathematical statistics are applied to analyzing data. The process of data analysis requires several different phases and often a feedback form later phases result in some additional work in phases earlier performed.

 

Data analysis processes require certain steps towards outcome or solution. The first step includes data requirements since the data which will be input to the analytics is specified upon certain requirements of those which are directing the analytics or customers who will use the product of that precise analysis. The common type of entity upon that the data will be gathered is described as an experimental unit. For example, an experimental unit may be a person or certain population of people who will use a product of analytics. These certain variables which are regarding a population such as variables of income and age may be obtained and specified. In this case, data can be both categorical and numerical.

 

Data requirements:

  • Step 1: Obtain existing relevant information
  • Step 2: Consider obtained information needs
  • Step 3: Specify information gaps
  • Step 4: Create new data and propose testing strategies

 

Data is obtained from various sources. The data requirements can be communicated by analysis to keepers of the data, like information technology staff within any organization. The data also can be obtained from sensors which are placed in the environment including satellites, traffic cameras, various recording devices. Etc. Other methods of data collecting include reading documentation, online sources, and interviews.

 

Methods of Data Collecting:

  • Direct or interview method
  • Indirect or questionnaire method
  • Registration method

 

Data gathered must be organized or processed for further analysis. For example, data processing may involve placing data into columns and rows if table format is used and the example of this data processing obtains structured data which is further analyzed using statistical software or a spreadsheet.

 

 

Once the data is organized and processed, it may happen that the data is incomplete or contains errors and duplicates. The need for data cleaning is significant since these problems are often, and data cleaning can solve these problems and enable data to be entered and stored properly. I other words, the process of data cleaning is methods used for correcting and preventing errors which may happen during data processing. Common tasks of data cleaning include identifying inaccuracy of data, record matching, the overall quality of obtained data, column segmentation and deduplication.

 

These data problems like incomplete data and various data errors also can be identified by various financial information when the totals of certain variables can be compared to independently published numbers which are believed to be reliable. The amount which may be unusual like below or above predetermined thresholds can be reviewed as well.

 

There are different kinds of data cleaning depending on the kind of the data which is obtained such as email addresses, phone numbers, customers, employers, etc. Quantitative methods for data cleaning for outlier detection may be used to diminish likely incorrectly obtained data. For example, textual data spelling checkers may be used to lessen mistyped words amount, but if the words are correct is harder to tell.

 

Data cleaning steps:

  • Import data
  • Merge datasets
  • Rebuild missing data
  • Standardise
  • Normalise
  • De-duplicate
  • Verify and enrich
  • Export data

 

Once the data cleaning process is done, data is ready to be analyzed. Data analysts can apply different methods and techniques which are referred to as exploratory data analytics for a better understanding of the messages and information contained in obtained data. This process of exploratory data analytics can result in some additional requests or additional data cleaning, so activities like these can be iterative in nature. Descriptive statistics like as the median or average can be generated to help better understanding of data. Also, data visualization may be used in order to examine collections of data in graphical form, and in order to obtain better insight regarding the information contained within the data.

 

Objectives of exploratory data analytics:

  • Discover patterns
  • Spot anomalies
  • Frame hypothesis
  • Check Assumptions

 

Mathematical models and formulas which are called algorithms are often applied to the data in order to identify connections between the variables like causation and correlation. Speaking generally, a mathematical model can be developed to evaluate an individual variable contained in data collection that is based on another variable within the data, containing residual errors that are depending on actual model accuracy. In other words, data is equal to a mathematical model containing an error.

 

Inferential statistics often include certain methods and techniques which are used in order to measure connections between certain variables in data analysis. Often used as a part of inferential statistics is regression analytics, and the regression model is used to change an independent variable in advertising or to explain certain variations in sales which are referred to as dependent variable.

 

If we are speaking in mathematical terms, then the dependent variable will be sales and the dependent variable will represent advertising. It may also be described as a model which is created in such way that model error minimize when the model makes a prediction for a certain range given of values of advertising. Analysts also can build models which are descriptive in order to simplify analytics and communicate outcomes.

 

Data modeling:

  • Explore data
  • Condition data
  • Select variables
  • Balance data
  • Build data models
  • Validate
  • Deploy
  • Maintain
  • Define success

 

The next step in data processing towards the data process is to use certain computer application which obtains data inputs and produces outputs, feeding them again into the environment. Data product may be based both on algorithm and model. For example, an application which analyses data about users purchasing history of various products and recommends other products which customer might like.

 

Once the step of data analysis if over, the data analyzed can be reported in various formats to the users of that particular analysis which would support their particular requirements. The users also may have feedback that results in some additional analysis. Therefore, much of the data analytics cycle can be iterative.

 

The analyst also may consider using data visualization in order to determine how to communicate the outputs. In this case, data visualization can help to efficiently and clearly communicate the information to the users. Data visualization uses charts and tables which are proper information displays and help communicate important information within the data. Tables are really helpful to a user who is looking for certain numbers, and charts can help to explain various quantitative information which is contained in the data.

 

Stages of Data Processing

Input Stage

Processing Stage

Output Stage

Storage Stage

Data collection

Performing instructions

Decoding

Storing data

Data capture

Encoding

Transform raw data into information

Presenting data to users

Retrieve data

Data transmission

Data communications

 

Quantitative Data Analytics

 

There are eight kinds of quantitative data analytics which are used for better understanding and communicating from a collection of data in which particular graphs are used to help communicate the information. Users specifying requirements and analysts who are performing the data analytics consider certain information during the overall data analysis process. These eight types of quantitative messages include following:

 

  • Time-series: One variable is captured over a period like unemployment rate over a 20-year period. Often used is a line chart which would demonstrate the trend.

 

  • Ranking: Subdivisions are mostly ranked in descending or ascending arrangement like the ranking of certain sale performance which is, in this case, the measure. A sales person is referred to as category, and each of sales person is within categorical subdivision during a certain period. Often to show ranking, a bar chart is used since it can show the comparison between the sales persons.

 

  • Part-to-whole: Categorical subdivisions are often measured as a certain ratio to the whole like a percentage out of hundred percent. In these situations, the best methods to show a part-to-whole is a bar chart or a pie chart since they can correctly show the various comparison of different ratios. For example, you can show the market share which is represented by competitors in the overall market.

 

  • Deviation: Categorical subdivisions also can be compared against a certain reference like a comparison of budget expenses for different departments of a business versus actual expenses over a certain period od time. In this case, a chart is often used to show a comparison of the reference amount versus the actual amount.

 

  • Frequency distribution: This method is used to show the number of particular variable observations for a certain interval, as the number of months in which the stock market is between certain intervals like zero to ten percent in return. A type of chart known as histogram can be used for these kinds of data analytics.

 

  • Correlation: Also possible is a comparison between different observations which are represented by two variables in order to determine if these variable tend to move in the opposite or same directions. An example may be plotting inflation and unemployment for a certain period. To show data correlation, a scatter plot mostly used.

 

  • Nominal comparison: When it comes to the comparing categorical subdivisions, and there is no particular order involved, a nominal comparison is used. An example may be finding the nominal comparison of the sales volume by-products codes. To show nominal comparison, a bar chart is used most often.

 

  • Geospatial or geographic: Also possible is a comparison of different variables across a layout or map, like the unemployment rate by the number of persons or by a state. For showing messages of geographic or geospatial analysis, a cartogram is typically used.

 

Methods Used for Analyzing Quantitative Data

 

Methods used for analyzing large collections of quantitative data include checking raw data in order to find various anomalies before processing data analytics and performing significant calculations multiple times like verifying data columns that are obtained in formula form. Further methods include confirming totals which are the sum of individual totals.

 

Another excellent method used in analyzing quantitative data is to check the relationship between certain numbers which should be linked in a certain predictable way like ratios over a certain period. A further step includes normalizing numbers to make comparisons easier like analyzing certain amounts per relative or person to GDP. Another example may be explaining an index value which is from a base year. Next step will be breaking problems into different components by analyt+zing certain factors which led to the outcome like DuPoint analytics or certain return on equity.

 

Stages of Quantitative Analytics

Framing the Problem

Solving the Problem

Communicating and Acting on result

Problem recognition

Modeling

Result presentation and action

Review of previous findings

Data collection

Data analytics

 

To place certain variable under examination, descriptive statistics is mostly used like median, average and standard deviation. For analyzing the distribution of the significant variables, descriptive statistics are also used primarily to see how the variables cluster between and around each other. The technique that is used for dividing and breaking down of variables into smaller parts is referred to as MECE principle and after McKinsey and Company.

 

In data analysis, each layer can be divided and broken down into various independent components, and each of these broken parts has to be mutually exclusive between each other, and they further collectively place up to other layers which are above them. This relationship is referred to as MECE or in other words mutually exclusive and collectively exhaustive relationship between variables.

 

An example of mutually exclusive relationship may be any profit which can be broken down into independent components, or in this case, it can be broken down into total cost and total revenue.  Further, total revenue will be analyzed by components like revenues of various divisions, and each of these divisions is mutually exclusive to each other. Further revenue divisions will be added to the total revenue which is collectively exhaustive in this case.

 

Mutually Exclusive and Collectively Exhaustive:

 

  • Mutually exclusive means there is zero overlaps
    Each element is different than others
  • Collectively exhaustive means there is zero gaps
    All possibilities are covered

 

Analysts also can use various statistical measurements in order to solve particular data analysis problems. Often hypothesis testing is used in cases when certain hypothesis about the actual state of various affairs is made, and the information is obtained in order to determine that actual state false or true. An example may be a hypothesis that unemployment is not affecting inflation. Therefore, hypothesis testing will consider the probability of two types of errors that are related to the data, and it's rejecting or accepting this hypothesis.

 

 

Regression analytics is also used commonly to determine the independent variable, and the extent of it affecting other dependent variables. For example, regression analytics can show us to what extent changes in the various rates like unemployment rate can affect the inflation rate. In this case, a unemployemnet rate is an independent variable and inflation rate id dependent variable. Regression analysis creates a certain model, curve to the data or an equation line.

 

To determine messages and information contained withing quantitative data collections, condition analysis may ve necessary. In this case, an analyst will use condition analysis in order to determine the independent variable and to which extend it allows dependent variable. For example, condition analysis can determine to which extent is a particular independent variable such as unemployment rate necessary for a particular dependent variable or a particular inflation rate. Multiple regression analytics uses some additive logic, and each independent variable may produce the result, and the independent variables also may compensate for each other. In other words, variables are sufficient, but not necessary at the same time. Therefore, necessary condition analytics uses logic, and single or multiple variables allow the result or outcome to exist, but at the same time, it doesn't need to produce the outcome itself. Each necessary condition has to be present, but the compensation is not expected.

 

Data analytics techniques:

  • Regression: Linear and non-linear
  • Classification: Supervised and unsupervised

 

Data Analysis Tasks

 

Data analysis tasks can be organized in three main tasks which include finding data points, retrieving values and arranging data points. However, these three tasks are just main tasks, and data analysis requires other additional steps towards an outcome of the data analysis.

 

The first task is to retrieve values when there is already given a collection of certain cases, and attributes within these cases have to be retrieved. Further, when concrete conditions are given, using filter and analysts will find data which is satisfying certain conditions provided with filtering.

 

When there is already given a collection of certain data cases by computing derived value an aggregate representation in numerical form cam be obtained. Further, an analysis will find certain data cases which are possessing an extreme value of any attribute when it is ranging withing the data collection. Next step is to sort and rank a collection of data sets by some metric determined by a user.

 

The further range will be determined withing a collection of data cases and also an attribute that is of interest will be determined as well. The span of various values withing the data collection will be determined.

 

The next step is to characterize distribution withing a collection of data cases also including an attribute of interest in quantitative form. By characterizing distribution values of attributes which are over the data collection will be determined. Further anomalies which may be contained will be derived from a given collection of data cases by a certain expectation or relationship.

 

The next step includes clustering of data collection cases and finding similar values of attributes that are of interest. Further correlation between a collection of data cases will be determined, and attributes will obtain significant relationships among the absolute values of these attributes. The final step is to find contextual relevancy in a given collection of data cases which are important to the users.

Data Analysis

Experimental Information

Measurement Physics

Data and Analysis Tasks

Analyst Methods and Algorithms

 

Experimental information: Contains details of the experiment

  • Measurement physics: Contains information about the underlying physics such as sources of noise or type of signal
  • Data: Actual data to be analyzed including metadata
  • Analysis tasks: Certain problem to be solved
  • Analysis techniques: Algorithms used to solve the problems

 

Data analysis methods:

  • Summarizing the data: Graphs, summary tables, descriptive statistics
  • Finding hidden relationships: Grouping, searching, correlation statistics
  • Making predictions: Mathematical models, inferential statistics


Chapter 2 Fundamentals of Data Modeling

 

Data modeling is actually software engineering involved in creating a certain data by applying formal techniques to be further used in an information system. A data model has abstractly created a model which organizes certain components of data, organizes and standardizes them in order to see what is a relationship between them and how they relate to each other. For example, a data model can specify which data component can be composed of a numerous number of other different components.

 

This term data model is often used in two distinct but related senses. Often data model refers to a formalization of elements in abstract form and relationships which were obtained in a certain application field. For instance, the users or customers, products, and arrangements which are found in a manufacturing company. On the other hand, this term also may refer to a collection of concepts that are used in defining various formalizations. For instance, a data model may be created involved in various concepts like attributes, entities, tables, and relations. Therefore, the data model of some banking application can be defined by using this entity-relationship model.

 

A data model determines the components and overall structure of datasets. They are specified in certain data modeling notation that is most commonly in graphical form. A data model is also often referred to as data structure, specifically when it comes to the context of computer programming languages. On the other hand in the context of models used for enterprise, a data model is often complemented by additional function models.

 

The main function of an information system is to manage a large amount of both structured and unstructured data. In fact, data models describe the overall structure, various integrity aspects which are stored withing data collections and they manipulate various components found within data models. They do not typically describe data that is not structured including email messages, word processing documents, digital video and audio, and pictures.

 

The Main Aim of Data Models

 

When it comes to the mains of data models, the main aim is to support and back up the development of various information systems by gathering and obtaining the format and definition of data. Data modeling is consistently done across various information systems, and further data compatibility may be achieved. In cases when same data components and structures are used for both storing and accessing data, multiple applications may share this data.

 

The results of sharing data with different applications may cost more money since more funds are required to operate, maintain and build data models. Greta cause of this may be that the quality of created data models which is implemented in information systems may be poor regarding the interface. Therefore, often some problems may occur during data modeling process. For example, business rules which specify performance of certain things in a certain place, very often are fixed withing the data model. In other words, very small changes in the way of conducting business may lead to great charges in computer interfaces and systems.

 

Another problem which may occur is when entity types are not identified, or they are identified incorrectly. In this case, replication of data may occur, ad functionality of data and overall data structure can cost more money since that duplication occurred. The overall development of data model and its maintenance will cost more than it should.

 

For different systems, data models are different, and the result of different data models is that interfaces which are complex will be required between these systems which share data. In other words, these interfaces will sometimes cost up to seventy percent of the overall cost of systems. Another problem which may occur is that the data is not provided in order to be shared electronically with suppliers and customers. In other words, the data structure of this model has not been designed in standard form. For instance, data designed for engineering and drawings for various process plant often is exchanged in paper form.

 

The main reasons for these problems is a lack of data modeling standards which can ensure that all data models designed will meet requirements such as business needs and at the same time stay consistent. A data model determines the overall structure of provided data, and most common applications of data models include a design of information systems, enabling database models, enabling an exchange of data. Data models are most commonly specified in the certain data modeling language.

 

Types of Data Models

 

There are three fundamental kinds of data models conceptual model, logical model, and physical data model.

 

Conceptual Data Model

 

  • This data model is used for describing the semantics of a certain domain which is being a scope of that particular data model. For instance, it can be a model of an industry or organization regarding their area of interest. This model consists of various entity classes which are representing types of things which are significant in that particular domain. Also, this model describes connections between assertions which are about certain associations among pairs of that entity class. The conceptual schema is used in order to specify the types of propositions and fact which can be expressed using this model. For instance, this model can define various expressions withing artificial language which contains a scope limited by model scope.

 

Logical Data Model

 

  • The logical data model is used in order to describe the semantics which is represented by a particular technology for manipulating data. This model consists of various descriptions of columns and tables, classes that are object oriented, certain XML tags, etc.

Physical Data Model

 

  • The physical data model is used for describing the physical significance by which data is being stored. This model consists of various partitions, tablespaces CPUs, etc.

 

Data Models

Conceptual Schema

Logical Model

Physical Model

 

The importance of these approaches is that they allow these perspective conceptual, logical and physical to be independent and not related to each other. Also, storage technology is capable of changing data without doing any effects to a conceptual or logical data model. The column or table structure also can make some changes, and there is no necessarily any affecting the logical and conceptual model. In both cases, the data model structure has to stay consistent corresponding to the other data model.

 

The column and table structure also can be different from a translation of the attributes and classes, but ultimately these structures can obtain the objectives within classes structure. During every phase of software development, these projects emphasize the creation of a conceptual model. This kind of design may be further detailed into another logical data model. Further in next phases, this model also can be translated into another physical data model. Therefore, it is possible to implement this directly to a conceptual data model.

 

Enterprise Data Model

Conceptual Data model

Logical Data Model

Physical Data Model

Documents the very high-level business objects and definitions. Provides wide scope to obtain a strategic view of Enterprise data.

The business key attributes and definitions of business data objects. It also shows the relationship between business data objects and broader scope. This model is also known as area data model.

Documents the business key attributes and definitions of business data objects. This model also shows the relationship between business data objects. Often it is within the scope of a detailed project.

This model shows technical design for example columns, tables, keys, foreign keys and other constraints to be implemented in the database or XSD. This model may be generated from a logical data model.

 

Database Models

 

A database model is data model used in order to determine the structure of a database in logical form. Database models fundamentally can determine the manner in which data may be manipulated, organized and eventually stored. The most common database model is the relational model that uses a table format to determine the structure of a database.

 

Database logical data models include:

  • Network model
  • Relational model
  • Entity-relationship model
  • Object model
  • Relational model
  • Document model
  • Star schema
  • Enhanced entity-relationship model

 

Database physical model includes:

  • Flat file
  • Inverted Index

 

Besides these common database models, there are other models which can be implemented such as correlational model, multivalue mode, associative model, semantic model, triplestore, XML database, named graph, multidimensional model, and others. A obtained database system management can provide single or multiple models. In fact, the optimal structure often depends on the organization of the data applications as well as applications of data requirements that include speed or transaction rate, scalability, reliability, maintainability, and cost. A great number of database systems of management are built on one specific data model, but it is also possible to create an offer of products which can support multiple database models.

 

Many physical database models, in fact, can implement various already given logical database models. Major database software actually will offer you some control in managing the overall physical implementation, and the choices which are made have a great impact on overall performance. A database model is both a way of designing and structuring data as well as a way of defining a collection of operations that may be performed on the data. For example, the relational model may define various operations including project join and select. However, often database operations are not explicit in a certain query language, but they still produce the ground foundation on that a query language is performed.

 

Flat Database Model

 

The flat database model is also referred to as flat file database or spreadsheet. This model usually consists of a two-dimensional or single-dimensional array containing data components where all elements of a column are mostly assumed to be values which are similar. On the other hand, all components placed in rows are assumed to be in some relationship to each other.

 

For example columns of password and names which may be used as a domain of a security database. I this case each row will have a certain password which is associated with a single user. At the same time columns of the spreadsheet will often contain a type of association with them that is defining them as data in characterization from including integers, information about date and time, floating point numbers, etc. This tabular representation of database models is, in fact, a precursor of most commonly used today relational model.

 

An example of flat database model:

 

 

Route No.

Miles

Activity

Record 1

1-95

12

Overlay

Record 2

1-495

05

Patching

Record 3

SR-301

33

Crash seal

 

Hierarchical Database Model

 

Models like hierarchical database model were mostly used in the 1970s, but these models are still found today in various legacy systems. They are primarily characterized as being navigational containing strong relationships amongst their physical and logical representation while at the same time they are data independent.

 

In this model, data is organized in such way of a tree-like structure with an implication that there is a single parent for every record contained in the model. A sort field is, in fact, keeping related records in a certain arrangement. These models were mostly used in the early stages of mainframe management systems like the IMS or Information Management System or IBM which now uses the XML database models. This particular model allows many multiple relationships between various kinds of data obtained. Also, this model is very useful and efficient when it comes to the describing various relationship in the real world like a table of contents, any sorted and nested information, recipes, ordering verses and paragraphs and others.

 

This particular model is also used to describe a physical arrangement of multiple records in storage. The actual record accessing is done in a way by navigating through the structure of data by using pointers which are combined with accessing in sequential form. However, because of this hierarchical structures are no efficient when it comes to the particular database operation when there is no a full path included for every single record contained there. These limitations later have been compensating by IMS forms since additional logical models have been imposed in the ground physical hierarchy.

 

An example of hierarchical database model:

 

 

Network Database Model

 

The network database model is in fact expanded hierarchical model based on its structure. The network model allows you to represent multiple relationships also in a tree-like structure containing numerous parent components. This model was widely used before it has been replaced in certain fields by the relational model. The network model uses two fundamental principles when organizing and classifying database structure. These two fundamentals are referred to as sets and records. Records are those which contain certain fields that are allowed to be organized in hierarchical order. Network database model sets, on the other hand, defines multiple relationships amongst the record meaning there is one parent record and many sibling records. In fact, a single record can be a parent in multiple collections of sets and in the same time it can be a component of that collection as well.

 

A network database model set often includes linked lists in circular mode, and a single kind of record, the collection parent or owner may appear once in every cycle. On the other hand, the subordinate or child of parent record can appear numerous time in every cycle. In other words, a hierarchy can be established between two or more record kinds. For example, the type of collection can be defined at the same time of defining another collection where one owner is also an owner of the another record. Therefore, all the collections may comprise directed graph, or in other words, it can comprise ownership that is defining a certain direction. It also can comprise a certain network construction. Accessing to multiple records is both sequentially or navigationally when it comes to the linked lists in a circular format.

 

Network database model is also used when it comes to the representing various redundancy in data way more efficiently than in previous hierarchical database model. Also, there can be involved more than a single path from a node of an ancestor to a descendant record. The operations involved in network models are mostly in a navigational form where a certain program maintains a certain position and further navigates through the records by following their relations in which parent and child records participate. Another way of locating the records is by providing key values.

 

Network database models also can implement the collection of relations by using pointers which can address directly the location of any record located on disk. This gives a user a great retrieval data performance and the expense in on operations like database reorganization and database loading. Many object databases, in fact, use the navigational principle to provide faster navigation between objects and record, often using record identifiers as pointers to related records.

 

Relational Database Model

 

The relational database model is most commonly used model introduced in 1970. This model enables certain ways in data analysis which make database systems of management more independent of various applications. This model is a mathematical model that is defined in both terms of set theory and predicate logic. Systems that are implemented in relational database models are used by midrange, mainframe and microcomputer systems.The products obtained with this database model are often referred to as databases in relational terms and implementing this model is in fact approximation of another mathematical model referred to as Codd model. For implementing relational model, three terms are used, and those are domains, attributes, and relations. The relation is represented as a table containing rows and columns. These columns which are named are attributes, and the domain is the collection of certain attribute values which are allowed to be taken.

 

Therefore, the ground data structure of any relational model is a table containing rows and columns, and information about a certain entity such as employee will be represented in table rows which are also referred to as tuples. Therefore, in relational database model the relation, in fact, refers to the tables contained in a certain database or we can say that relationship is also a collection of tuples. The columns are those that enumerate multiple attributes of various entities like address, a name of an employee, phone number and other. On the other hand, a row will represent a specific instance of various entities such as a particular employee. Further, each tuple containing employment table will represent multiple attributes related to a certain employee.

 

All relations including table relations in this kind of database model have to stick to fundamental rules in order to qualify as relations. The first rule is that the arranging of columns is always immaterial in a table form. The second rule is that there are not allowed same tuples placed in rows in a table format. The third rule is that every tuple has to contain a certain value for every of related attributes.A relational database model is allowed to contain various tables, and these tables can be similar to those of flat structure models.

 

On the other hand, one of the various strengths which are related to this model is that any value that is appearing in multiple records and belonging to different or same tables can imply relations among these records.

 

However, to enforce integrity constraints, relations amongst the records also can be referred to as explicitly. Therefore, the relation by identifying parent-child record relation may be explicit as well, and values are assigned cardinality in the range one to one. Tables also are allowed to have a single attribute or a collection of an attribute which is able to act as a certain key that can be further used to uniquely identify every tuple contained in the table.

 

A certain key which can be used to identify a row in a database table is referred to as the main key. Keys are most commonly used in order to combine or join data contained in multiple tables. For instance, an employee database table can contain a certain column that is named as a location which would be containing a particular value that is matching the key located in a location table. Also, keys are very important when it comes to the creating of indexes that can facilitate easy retrieval of various data from large collections of tables.

 

It should be noted that any column is allowed to be the key, and multiple columns also can be the key. Multiple columns have to be combined into a single compound key. However, you don't have to define all possible keys in advance, since the column is allowed to be used as a single key even though originally it was not meant to be one.

 

There are also keys that have real-world or external meaning, and those are referred to as natural keys. If there is no any natural key that is suitable, a surrogate or arbitrary key also can be assigned. However, in major practices, most databases contain both natural and generated keys. Generated keys also can be used internally to create relations between rows which cannot be broken. On the other hand, natural keys also can be used, but less reliably, for integrations and searches with other provided databases.

 

Example of relational database model:

 

Key = 24

 

 


Chapter 3 Getting Started with Power BI

 

In this chapter of the book, we will get familiar with the powerful tool provided by Microsoft Power BI. The Power BI is great business analytics tools which can deliver significant insights throughout your business organization. Power BI enables you to link together hundreds of different data sources, drive ad hoc analytics and to simplify and categorize data preparation. Power BI powerful business analytic tool which is allowing you to produce outstanding reports and then publish those reports for your business organization to have a look at them both on the Web and portable devices.

 

As we said, Power BI by Microsoft is outstanding business analytics tools, and business analytics refers to the practices, skills, and techniques that are used for iterative investigation and exploration of business performance in the past. These gained perspective of past performance is a key towards gaining current insight and proper driving current business planning.

 

Business analytics is mostly focused on developing better and new insights of business market and a better understanding of overall business performance that is based on statistical methods and data analytics. On the other hand, business intelligence is focused on a consistent collection of metrics in order to measure the performance in the past and to guide future business planning that is also based on statistical methods and data.

 

Business analytics extensively use various methods of statistical analytics like predictive modeling, explanatory and management based on fact to drive certain decision-making. Therefore, business analytics is also related to management science. In this field, it can be used an input for fully automated decisions or decisions made by a human. Business analytics is, in fact, reporting, querying and online analytical processing which is all provided by the Power BI tool compatible both with a desktop computer and portable devices users.

 

Reporting, querying, alert and online analytical processing tools like Power BI have a great impact on overall business planning and business management. These tools can answer important questions like how many happened and what happened, where the problem is, what may be the solution and which actions are needed. Business analytics tools provide these answers as well as they can predict what will occur next and what is the best which can occur. So certain prediction and optimization can be developed.

 

Business analytics answers the questions?

  • What happened?
  • When did it happen?
  • Why did that happen?
  • Will that happen again?
  • What will happen if we do something differently ?
  • What is the data telling us?

 

Types of business analytics include decision analytics which is used to support decisions made by humans with visual data analytics which the models use in order tor effect reasoning. Another type of analytics important to business analytics is descriptive analytics used in order to gain a better insight from past data with clustering, reporting, scorecards, and others. Predictive analytics is also an important field related to business analytics which uses predictive modeling based on machine learning and statistical techniques. Another important field of analytics is prescriptive analytics which recommends various decisions using simulation and optimization.

 

Business analytics includes:

  • Statistical and quantitative analysis
  • Data mining
  • Multivariate testing
  • Predictive modeling
  • Text analytics
  • Big data analytics

 

Power BI is a great tool both for analysts and business user. With Power BI you can transform and clean data simply, and data cleaning may be time-confusing using different software. Tools for data modeling and data shaping are extremely easy to use and it they will save you many hours in your busy day. Power BI is a powerful solution for data analysis, and its capabilities allow you to visualize, connect, shape and share data quickly. When it comes to the delivering data to decision-making with Power BI, you are able to share and publish interactive reports on various platforms.

 

Power BI lets you to easily go from any data to insight to final action. You can create reports in minutes and connect to hundreds of various sources. Power BI software is also compatible with portable devices so you can view your dashboard at any time on the web both via desktop computer and portable device. This tool offers a simplified management, and your data is completely secured as soon as you publish it.

 

So for you to start, you should first get to know better what this tool offers you and how it can significantly improve your business decisions. Power BI is a wide collection of software apps and services which are working together in order to make your data into visually immersive, coherent and interactive insight. Power BI allows you to easily connect to various data sources, discover and visualize them and publish and share them on the Web.

 

Power BI also lets you create quick insights from various Excel spreadsheets as well as from local databases. This tool is always ready for real-time data analysis and data modeling, which makes it your personal visualization and reports tool serving as a great tool for entire corporations, group and individual projects and various divisions. Power BI tool consists of three elements, the desktop version, mobile applications and the service so that you can use Power BI on the go as well.

Therefore, you can use Power BI for various data analyzing tasks including business reports, number-crunching, monitoring progress on your sales, creating engaging business reports, getting better insight into the current market and others. When you create reports, that report will be published on Power BI service and shared to other users who then can consume information.

 

Everything you do with Power BI may be broken into a ground building blocks. Once you get to know these building blocks, you can further expand every of them and begin creating more complex reports.

 

The Fundamental Building Blocks:

  • Visualizations
  • Datasets
  • Reports
  • Dashboards
  • Tiles

 

A visualization is often referred to as simply visual. Visualization is, in fact, a visual data representation like a color-coded map, chart and graph and other types of data which is possible to be represented in a visual format. Visualizations may be simple in a form such as a number which represents something important, or more visually complex such as a gradient-color map which shows sentiment towards a particular social concern or issue. The ultimate goal of any data visualization is to represent data in such way that it provides various insights and context, which is difficult to obtain from a raw table containing text or numbers.

 

Dataset is already familiar to you, and Power BI uses its data collection in order to create visualizations. As a dataset, you can use a single table created in Excel, or you can use a dataset which is a combination of various sources which you can further combine and filter in order to create a unique dataset to be further used in Power BI.

For instance, you are able to create a dataset which is provided from multiple different database collections, like an Excel table, a single website table and online results of some marketing campaign. Therefore, that uniquely designed database is still referred to as a single dataset even though it is combined from multiple sources.

 

With Power BI you can also filter your data, and this tool allows you to focus on those features which are important to you. For instance, you can filter contact database in such way that only those customers who eventually received emails will be included in that specific dataset. Also, you can create visuals that will be based on that specific subset of users who were previously included in your campaign. Filtering data generally helps you to focus on things that are important to you and to emphasize your overall efforts.

 

This powerful business analytics tool also lets you use a multitude of various data connectors which may be included. Whether the data you want to obtain is in SQL database, Oracle, Azure or Excel as a spreadsheet these already built-in connectors of data will allow you to easily connect various sources to that specific data, and filter it if needed and eventually bring it into your created dataset. Once you gained a dataset, you can start working of visualizations which can display various portions of a different dataset in numerous ways.

 

A report is a collection of various visualizations which appear combined on a single or multiple pages. Using Power BI, you are able to create various reports such as those reports used in sales presentations. In other words, a report in Power BI is a certain collection of various components which are all related to each other in some way. For creating reports, you can also use Power BI service. Reports allow you to design multiple visualizations on a single or multiple pages if needed and of course, you can arrange them in any way you like. Therefore, you might want to create a report on quarterly sales, various product growth in a certain segment, reports based on migration patterns or anything you can think of. Whatever your topic may be, reports allows you to combine and organize wide range od visualizations into a single one.

 

Once you are ready to share a collection of your visualizations or to share pages from a report, you will create a dashboard. Pretty much like any dashboard form, Power BI dashboard is a certain collection of various visuals which can be shared with another user via Power BI service. Often a dashboard is a particular group of various visuals which provide quick and easy insight into your data which you want to share and present to others.  A dashboard which has to be placed on a single page is called canvas. Dashboards also can be shared for other people to interact with them via their portable devices.

 

Another important fundamental element of Power BI tools us tile which is, in fact, a single visualization that is found in created reports or within a dashboard. You will see that a tile is a rectangular box which contains independent visuals. When you are using Power BI for creating both reports and dashboards, you are allowed to arrange and move tiles in whatever direction you want. Also, you can change tiles and make them bigger or adjust their width and height as well. You can also interact with various tiles which are not created by you, so this is called as consuming or viewing a report or dashboard. When you are just viewing or consuming these elements you are not allowed to change the way in which they are ordered.

 

Power BI Service

 

As we are already familiar, the common flow in Power BI is to create a report, or a dashboard which will be shared and published on Power BI service and other users will be able to interact and view your work both on a mobile app and in the service.

 

Power BI service includes a content pack which is a collection of previously configured visuals which are ready-made base on various data source like Salesforce. This content pack offers you a wide collection of various entrees which are created to go together. Content pack reports and dashboards are presented in a ready-to-go way all well combined for further use.

 

You can get data from the Power BI service with just one click, so you just have to select the Get Data located in the bottom corner. You will see which available sources you can use with some additional sources like Azure data and Excel databases and files. Power BI also lets you connect to specific software service known as SaaS cloud services or providers which are Facebook, Salesforce, Google Analytics and others. From these specific software services, you can get an already ready collection of visually previously arranged in reports and dashboards. These are called Content Packs. Content packs pre-arranged let you get started instantly and just select the service you want.

 

For instance, if you are going to use the Salesforce pack, Power BI will connect you to your account at Salesforce, and then pre-defined collections of various visuals presented both in reports and dashboards will be available to you for further use.

 

Power BI offers you various content pack sources, and you will see that these services are arranged in alphabetical order so you can easily find the service you want. Once the data from chosen pack is loaded, you will see the content pack of dashboards and reports.

 

In addition to provided reports and dashboards, a dataset is also provided which contains the certain collection that is pulled from service pack you chosen.

 

Power BI Service:

  • Software as service SaaS
  • Hybrid Solution on-premise or cloud data sources
  • HTML5 reports and dashboards supporting mobile apps
  • Real-time and streaming data source support

 

On the dashboard which is provided, you can click every visual, and there contained any you will be automatically taken to other page containing reports of that certain visual.

 

Another great thing provided in Power BI service is that you can also ask various questions of data, and you can also design visuals which are based on the question you asked in real time.

 

Once you notice the visual which you like you select Pin icon which is located to the right of Natural Language Bar, and that certain visual will be placed on your dashboard.

 

Other options which are provided in Power BI service is refreshing data contained in the content pack or any other data you are using. In order to refresh data, you select the three dots which are located next to data set, and you will see that menu appears.

 

Next step is to select Schedule Refresh option located at the bottom of the menu. Further, the selecting dialog will appear, and you may refresh any data you want.

 

Power BI Steps:

  • Bringing data into Power BI a creating a report
  • Publishing the report to the Power BI service where you will build dashboards and design new visualizations
  • Sharing your dashboard with other users
  • Viewing and interacting with shared reports and dashboards using Power BI service


Chapter 4 Analyzing and Visualizing Data with Power BI

 

The majority od Power BI features are both available fro Power BI desktop version and Power BI service. We are already familiar with getting data, but it may often occur that obtained data is not well-formed or is not clean as it should be. In this case, data which is gathered have to transform or cleaned which is known as data cleaning and data transforming.

 

Power BI both desktop version as well Power BI service allows you to visualize, clean and connect your data. Using this tool you can further visualize and model data in various ways. Once you downloaded and installed Power BI application to your desktop computer, you are ready to go.

 

This tool lets to connect a wide range of different data sources, from anywhere in local areas to Excel spreadsheets to cloud service. Features like formatting and cleaning data are well performed which make data usable in various fields. Renaming and splitting columns, working with various dates and changing data types also can be performed. You also may create various relations between columns to make it easier to further model and analyze data.

 

Once you opened Power BI software on your computer, to get started select Get Data. Once you do that the collection containing data sources will appear and you can choose from offered data sources. No matter what data source you decide to work with, Power BI will connect to that particular data source, and you will see which data is available from that chosen source.

 

In order to start building reports, you should select Report view. The report view contains five areas including the ribbon, the report view, the pages tab, the visualization pane and the fields pane.

 

  • The ribbon: Displays tasks which are associated with visualizations and reports
  • The report view: In report canvas visualizations are arranged and created
  • The pages tab: The pages tab allows you to add or select a report page
  • The visualization pane: In visualization canvas, you can customize axes and colors, drag files, apply filters and change visualizations
  • The fields pane: Query filters and elements can be dragged to the Filters are within the Visualization panel or dragged onto the Report view

 

The fields and visualization may be collapsed if you select the small arrow located along the edge which would constantly provide more space for building visualizations in the report view. Also, you can expand that section in addition to collapsing it. In order to design a visualization you just have to drag a field onto the report view. Power BI automatically creates a visualization based on the map, since it recognizes state field which is containing geolocation data.

 

Once you created your data visualization, it is ready to be published. To do this go to the home ribbon and select button publish. You will have to sign in to Power BI first, once you do that your data visualization will be published successfully. When you sign in to your Power BI account, there you will see that you published data visualization in the Power BI service. You can go to your reports and select pin icon to add that visual into your dashboard.

 

Connecting Data Sources

 

In Power BI you can connect a wide range of various data sources. Currently over fifty-nine various cloud services like Marketo and GiHub contain certain data connectors so that you can connect to various sources through ODBC, text, CSV, and XML. Power BI tool can even scrape data directly from any website. In the same manner, first you select button get data located on the home table. There are various sources provided and in order to get started select one to establish a connection. You may be asked to find that source on a network or your computer, depending on which source you selected.

Once you connected data source, the navigator window will appear. The navigator window displays the entities or tables of your chosen data source, and by clicking on it, you can get a preview of components within data source you selected. Selected entities and tables can be imported immediately. You can also edit data before importing it by selecting Edit.

 

Once you have selected entities and tables, you would like to bring into Power BI, and you select load located in the bottom corner od navigator. Sometimes you may wish to make some changes to entities and tables before loading them, so to do this just select filter transform or edit button before loading data sources to Power BI.

 

Power BI also includes query editor which is a powerful tool for transforming and shaping data, so your visualization and data models can be outstanding. In order to launch query editor, you should select edit located in the navigator. Another way of launching query editor is from Power BI Desktop by selecting edit queries located on the home ribbon.

 

Once you selected query editor with data already provided and ready to be transformed you will notice various sections including sections like sharing and viewing, selected query display, a listing of query's properties, steps which are applied and others. In order to see what are available transformations, you just right-click on a certain column display. You will see that you can make changes like duplicating the column, reforming the columns, replacing values, renaming the column, etc. Also, you can split text columns into various multiples by delimiters which are common.

 

The query editor also contains some additional tools like changing the data type, extracting elements from various dates, adding scientific notation and others. Once you applied a transformation, each step you did will appear in the section applied steps located in the query settings canvas. This list also can be used to review and undo certain changes which you have made previously. In order to save changes select close and apply button located on the home tab. Once you do this, query editor will apply all changes which you made, and they will be applied to Power BI desktop.

 

Data Transformation and Data Cleaning

 

Once you have shaped your data using query editor, you can also look at it in different ways as well. Three views are offered in Power BI, and those are data view, report view, and relationship view. In order to access any of these views, you select icon located on the upper side of the canvas.

 

Power BI views:

  • Data view
  • Relationship view
  • Report view

 

In order to change the view, you just select one of two icons remaining. During the data modeling process,  Power BI can combine data from various sources into one report. You can also add a source to already create a report by selecting edit queries located in the game ribbon and further select new source. You also can import data from various files at the same time, and these will appear as binary content in the query editor.

 

One of the most useful tools here is filters, and for instance, you can access various filters by an opening checklist of various text filters which can be used to remove certain values within your data model. You can also combine append queries and turn multiple tables into a single table which will contain only the data you want. Using append query tool you can also add the data to an already existing query from a new table. In Power BI you can also add a custom column based on M query language terms.

 

Power BI allows you to import data from various data sources, its modeling and visualization tools work best with data in columnar format. It may happen that your data is note imported in a simple columnar format which is a case with Excel spreadsheets which often are not optimized for further automated queries. However, Power BI provides a tool to easy and quickly transform any multi-column data into datasets that are proper for further use.

Using transpose in query editor, you can turn rows into columns and columns into rows in order to break data into formats which can be easily manipulated. Sometimes you may also need to format data so you can identify and categorize that certain data once it is loaded. You can use fill to turn zero values into the certain values located below or above in a provided column.

 

Unpivot columns also can bu used to cleanse data into a dataset which you can further use. Using Power BI you ar able to experiment with numerous transformations on data, and you can also determine which types of data in columnar format can be worked on in Power BI. In should be noted that all actions you will take are going to record in applied steps sections, so you can always undo any changes you made.

 

Modeling Your Data

 

In order to create a logical relationship between various data sources, you need to create a certain relationship between these data sources. Power BI allows you to know exactly how these tables are related to each other so you can create reports and visuals. On of the many great advantages which those tools offer is that there is no need for flattening your data, and you can use various tables from different sources and further define the relationship between them. You can also make calculations and assign metrics to them in order to view certain segments of your data, and further use these metrics in visualization for much easier modeling.

 

Therefore, you can visually set the relation between elements and tables. In order to see a diagrammatic view of data gathered use the relationship view located on the left next to the report view. When you access the relationship view, you will see a block which represents every table and table contained columns. Lines which appear between them represents a relationship. Removing and adding relationship is easy, and by right-clicking it and selecting delete, you can remove it. Drag and drop are used when you want to show relationships between tables. You also hide an individual column or a table by selecting hide in report view.

 

In order to get more detailed view of data relationships, you select manage relationships located in the home tab. Once you do this, manage relationship dialog will appear where all relationships are displayed as a visual diagram or a list. You can also edit relationships manually from manage relationships dialog. You can here find cross-filter direction and cardinality direction to set your data relationships.

 

Options available for cardinality are one to one and many to one. In fact, many to one are dimensional type relationship, while one to one is used for connecting single entries in any reference tables.

 

On the other hand, cross-filtering is in one direction, and some limitations are involved regarding data modeling capabilities. It is important to set accurate relationships since they will allow you to design more complex calculations of multiple data components.

 

Using Power BI tool you can also create calculated columns which is an easy way of enhancing and enriching your data. A calculated column is in fact newly created column which you create by defining certain calculations which combine or transforms components of already existing data. Once you created calculated column, the next step is to optimize data models to create better visuals. Power BI allows you to optimize data which would make it instantly more usable to create visuals and reports.

 

Further steps include creating measures and working with time-based functions. The data analysis expression language involved in Power BI offers various useful function mostly those who are time-based like year over year to year to date calculations. Here you can define certain measure or time once and further slice it by various fields. These defined calculations are called measure. Also significant in data analysis process are calculated tables that allow you to express a wide range of new capabilities for data modeling. For instance, if you want to do various types of merge join or to simply created new tables which would be based on a functional formula, creating calculated tables is a way to go.

 

With Power BI it is easy to work with time-based data since the data modeling tools in Power BI can automatically include various generated fields which allow you to drill through quarters, years, months and days with only one click. Once you created a data visualization using date fields, breakdowns based on period will be automatically included. For instance, the data table can automatically be separated into a day, month, quarter and year by Power BI.

 

Data modeling in Power BI:

  • Managing data relationships
  • Optimizing data models to create better visuals
  • Creating calculated columns
  • Creating calculated tables
  • Creating measures and working with time-based functions
  • Exploring time-based data

 

Visualizations in Power BI:

  • Creating and formatting slicers
  • Map visualizations
  • Scatter charts
  • Tables and matrixes
  • Gauge and single number cards
  • Waterfall and funnel charts

 

Power BI

Power Query

Power View

Power Pivot

Power Map

Transform, merge and filter

Dashboard dynamic power

Data model merge and format

Geo-visualization


Chapter 5 Applications of Data Analysis

 

Data analysis applications are used to improve and measure the performance of past and current business operations. They use collections of past data in order to provide tools and information which are useful to business users, and that will let them make significant improvements. Levels for business analytics are operational reporting, business dashboard, analytic application and analytics reporting.These applications of data analytics may further extend to a domain of predictive analysis. Business analytic applications mostly relate to analyzing business processes important in support of users decision making. For instance, a business analytic application may relate to various sales analysis, risks involved in profitability analysis or accounts analytics.

 

Data analysis is widely used by banks in order to differentiate among their customer based on various characteristics including credit risk. They used data analysis to match user characteristics with product offering which is appropriate for them. Major gaming firms use data analysis in customer loyalty programs. Quantitative analysis together with the predictive analysis is used by wine companies to predict appeals of wines.Basic domains of data analysis include enterprise optimization, marketing analytics, transportation analytics, telecommunication, fraud analytics, financial services analytics, pricing analytics, risk and credit analytics, health care analytics, supply chain analytics and others. Data analytics have been widely used in business management since the 19 century, but it has gained more attention in the late 1960s when a computer has been used in decision support systems. Since then, data analytics have changed and at the same time formed together with the development of resource planning. Later business analytics with the introduction of a computer has exploded, and the change has brought limitless business analytics opportunities.

 

Big Data Applications in Real Life

 

Big data is definitely taking the world, and the importance of data analytics has grown rapidly making various companies and industries rely on data analysis in order to gain insights from various data sources, improve their business performance through current data analysis. Data analytics provide great insights using both semi-structured and unstructured data sources. Also, data analytics helps companies to mitigate risks and make smarter decisions by performing proper risk analysis. Many industries are propelled by data analysis including insurance services, healthcare, public sector services, industrialized and natural resources, banking sectors, and others.

 

Data analysis is also important to various internet search engines like Google, Yahoo, AOL, and others. These search engines use data analysis to deliver the best possible results when we search query. Digital advertisements especially targeted advertising are using various data analytics algorithms and models in order to get higher CTR than that provided by doing traditional advertising. They use data analytics to gain insight into customer's past purchases and behavior.

 

Internet service like Amazon also uses data analytics to support their recommender system and make suggestions for their customer based on products they have already purchased. So they help you to find products which might interest you which adds a lot to the overall users' experience. All major companies also use this engine in order to promote their new products and make suggestions which are according to customer's interest. In order to improve overall users' experience internet, giants like Google Play, IMDB, Twitter, Linkedin, Netflix and others use this recommender system.

 

Various gaming software like Activision-Blizzard, Sony,  EA sports, Zynga, Nintendo, and others have improved overall gaming experience using big data analytics models and algorithms which improved themselves as you move up to higher levels. Also in motion gaming, your opponent or computer can analyze your past moves and in according to them to shape its game.

 

Other applications of data analysis include price comparison websites, airline route planning, banking fraud and risk detection, delivery logistics, and others. From this chapter, yu can see what great impact data analytics models and algorithms have on both in various scientific fields and in real-life situations. Just imagine that Google wouldn't be what it's today without data analysis techniques and data analysis modeling. Other internet searches as well wouldn't be what they are today if there is no knowledge of predictive and data analysis.


Conclusion

 

This book will help you to gain a better insight into the world of data analytics and with some practice, you will become Power BI expert. Power BI is very powerful tool provided by Microsoft, and you can use it for your various projects which are in the domain of business and data analytics. In the previous chapters, fundamental steps towards great business analytics solution are explained, and you will be able to create by yourself outstanding data models using Power BI. You are also able to share your models with another user by Power BI service, and you can as well interact and view solutions published by other Power BI users.

 

Reading this book you will also gain valuable knowledge in data analysis, and in the previous chapter, we have seen what are important applications of data analysis today major companies and industries. We have seen what is the impact of data analysis today in various services like healthcare, banking systems, Internet search engines, gaming development, business management systems, digital advertising and recommender systems, so you have a better idea of how data analysis is significantly affecting various scientific fields as well as real-life scenarios.

 

Here you learned fundamentals of data modeling process and how to adequately communicate with your visuals and data. Once you get to this part of the book, you will be ready to go on your adventures and ready for application of everything you have learned here to your projects. This book is a guide for you into the world of data analysis, and you will make most of it, by applying everything you have learned here and created something that is important to you and an adventure is waiting for you around the corner.