Data Lake vs Data Warehouse

August 28, 2022

Data Warehouse and Data Lake are two different methods to store data for different purpose and it is used to different skilled peoples. But I assure you that both are entirely different in their purpose. Let’s make it clear the major difference between Data Lake and Data Warehouse.

Data Lake
Data Lake is used to store the Row Data. The Data maybe Structured, Unstructured or Semi-Structured.

Difference between Structured, Unstructured and Semi-Structured Data:

Structured Data are in the form of Row and Table. It is well Synchronized and We'll Managed. This Data can be easily Fetch from the Database or Data Warehouse.

Unstructured Data on an other hand, is Scattered and not well Managed. This Data Mostly in the form of Graph or in XML format.

Semi-Structured Data is not Scattered or not Well managed. However, it is some how easy to find data and query the result.

In Data Lake, Data is in Massive Amount. It can be Petabyte or Zetabyte of Data. But still Data is Cost Effective because if we had wrote data into data lake, it can update easily but in the case of data Warehouse, this trick doesn’t work.

It is very costly to update data in data Warehouse. Due to large amount of data in the data lake, the analysis is very Difficult and Time taking. Time is only concise If the data is in catalogue. The data lake is used by data scientist and data engineer. The major used of data lake is in big data and real time analysis of life Dashboard.

Data Warehouse
In the data warehouse the data is in specific order and the specific data is used for the specific purpose only.

Data warehouse contain the data mostly in structured form also the size of data is small as compared to data lake. Due to this small amount of data data analysis is very optimised as compared to data lake. As I told you earlier that updation of data is very costly in data warehouse. The data warehouse is used by data analyst, business analyst, data scientist and machine learning engineer.

There is a huge story that involves in data lake and data warehouse that how the data is transfer from one place to another, that medium is called data pipeline. In that topic I will also cover the qualities of data that must insure by the data engineer to work on it this will be covered in the next blog and I will attach the link in this blog as well. Till then keep Aiming, Keep Practicing.

Connect with me 😊


Source link

Comments 0

Leave a Reply

Your email address will not be published. Required fields are marked *