Untitled @latentviewanalytics - Tumblr Blog

Handling Data at Scale Using One-Line EDA Libraries

In this era, handling data is one of the key challenges organizations face worldwide. Irrespective of advanced data analytics capabilities, the first step is always the exploration part, where businesses need to understand, slice, and dice the data. This becomes the base for the next steps where advanced analytics come into the picture. Hence, the significance of doing exploratory data analysis is growing, and the challenges while performing Exploratory Data Analysis (EDA) with the large data volume are becoming more complex.

In one of our recent works for a leading technology firm, we performed EDA for around 5TB of data. We couldn’t proceed with Excel or any other BI tools because handling vast amounts of data is not feasible in such platforms. Hence we had to choose an alternate method. The one-line EDA libraries allow us to explore the data quickly. During this process, we explored some of the best-in-class one-line EDAs and finally figured out the best one that suited our requirements. This blog will take you through a few one-line EDAs used in various EDA use cases depending on the problem and data.

What is EDA?

Exploratory data analysis (EDA) is the first step in data science to investigate data sets without prior background. The ultimate goal of EDA is to understand what the data tells us by summarizing the main characteristics of data. Developed in the early 1970s by American mathematician John Tukey, EDA continues to be a widely used technique to understand the data.

Why do data scientists use EDA?

Here’s a truth that all data scientists need to accept – data comes with several flaws. For example, raw data may have missing outliers and duplicate values. So it is crucial to use EDA to perform graphical and non-graphical analysis to get unbiased and accurate results.

Non-Graphical Analysis includes:

Describing data to analyze data types, min, max, mode, median, quartiles, and more Handling missing and duplicate data Outlier detection Understanding correlation between the variables

Graphical Analysis includes:

Univariate Analysis Bivariate Analysis Multivariate Analysis Performing EDA on TB data size involving graphical and non-graphical analysis needs several lines of code to be written and is time-consuming and challenging. Hence, we bring in one-line EDA libraries that perform all these tasks in a single line of code.

What is a one-line EDA?

One-line EDA is easy-to-use libraries that provide a better overview of data by quickly analyzing and generating detailed reports of the dataset, saving both time and effort.

Some of the one line EDA are:

Sweetviz Autoviz Pandas Profiling D-tale We started exploring the one-line EDA tools mentioned above, experimented with a small sample dataset on-premise, and gathered the reports.

Sweetviz

According to the Sweetviz documentation, “Sweetviz is an open-source Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with just two lines of code. The output is a fully self-contained HTML application.”

pip install sweetviz import sweetviz as sv report = sv.analyse(dataframe) report.show_html()

Learn more at https://www.latentview.com/blog/handling-data-at-scale-using-one-line-eda-libraries/

#EDA #Exploratory Data Analysis

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

An Introduction to Cryptocurrencies and Blockchain

The Dawn of Cryptocurrencies and Blockchain

Whenever a new technology, which is a product of a few previous decades of research and hard work emerges, people from several walks of life may doubt and criticize the revolution that it might cause. But, on the other hand, technology leaders may be astonished and embrace it. Eventually, when it gets commercialized, people may wonder why its potential wasn’t evident in the beginning. Technologies that have gone through this process include Personal Computers in the 1970s, the Internet in the 1990s, and Cryptocurrencies in the 2010s.

Evolution of technology that addresses the drawbacks in digital payments

Money has been a part of human history for several centuries in some form or the other. However, money has seen a massive revolution from the barter system of transactions to the current cryptocurrency system. With the advent of the digital age in the 21st century, money took its digital form, succeeding its previous physical form. Money evolved to a new form to address the problems in its current state and the economy.

There have always been two fundamental problems in any digital payments system.

Suppose there is no easy way to verify whether an individual had sufficient funds to pay their counterpart in a transaction. In that case, he could engage in double spending – multiple fraudulent transactions – leaving the other unpaid. On the other hand, if an individual provided all the required private information to prove that he has sufficient funds, the funds might be stolen before they could be used for the transaction. The time-tested solution for this problem is the involvement of a trusted third party, which is permitted to have access to the private information of the availability of the funds. This ensured a safe transaction between the two parties without losing funds or information. But these intermediaries charged hefty fees for their service, and it is estimated that the current cost to consumers for cross-border payments is 6.5%.

There is also a problem that these intermediaries can be hacked, compromising vast amounts of private data. A real-life example of this would be the Equifax hack of 2017, where the private details of around 145 million Americans were compromised.

Cryptocurrency and its underlying technology, Blockchain, can address this problem by replacing a centralized trusted authority with a distributed network of users, achieving a decentralized peer-to-peer payment system.

What is a Cryptocurrency?

A cryptocurrency is a digital or virtual currency secured by cryptography, making it nearly impossible to fake or double-spend. Most cryptocurrencies are decentralized networks and are based on blockchain technology—a distributed ledger enforced by a disparate network of computers.

More precisely, cryptocurrencies facilitate payments – or other exchanges of information – between people without the oversight of a central body (like a government or a bank).

So, what is a blockchain, and how does it work?

A blockchain is a distributed database shared among the nodes of a computer network. As a database, a blockchain stores information electronically in digital format.

The primary cryptographic mechanism used in Blockchain is called cryptographic hash function – a mathematical algorithm that creates a mapping from random data to a bit string of fixed size.

Cryptographic hashing is a one-way function, and it is impossible to decrypt. On the other hand, SHA-256 is a popular hash function used in cryptocurrencies, and it creates a digital signature for any input data to a 256-bit hash.

To illustrate an example of how SHA-256 works, visit http://blockchain.mit.edu/hash/ . Type ‘DATA’, the corresponding hash is ‘c97c29c7a71b392b437ee03fd17f09bb10b75e879466fc0eb757b2c4a78ac938’. Even a small change in the data will result in a completely different hash. Example, type ‘DATa’, the resulting hash is ‘09ac1e78592c65b92a29672f6978dee0c8f25a92aded1fb6fbaa8a903bbd7e9e’. However, a given data will always yield the same hash value. Any given hash can be solved only through the brute-force method, i.e., trial and error. This ensures that one cannot easily recreate the original message from the hash value, and it is also impossible for two different messages to create the same hash value. In addition, this feature enables the network of peers to view the transactions occurring between any two parties.

Each set of new transactions is combined into one block. Data on each block of transactions creates a unique hash or digital signature corresponding to that set of transactions. An additional piece of data included in each block is the hash from the block immediately before it on the chain.

Read full story at https://www.latentview.com/blog/an-introduction-to-cryptocurrencies-and-blockchain/

#Blockchain #cryptocurrency

Trending Blogs

Last Seen Blogs

Untitled