What Is Data Lake Architecture? A data lake is a central repository that stores vast volumes of both structured and unstructured data “as-is” in a native format. Now let’s add a data lake. And since this is a newer term, we will talk about it in more detail. Data lakes, as a way to store unstructured data in a more cost-effective way, began to grow around the 2000s. The key phrase here is profitability.
Data lake example
Going back to the grocery store example we used with the data warehouse; you might consider adding a data lake to the mix when you need a way to store big data. Think about the social sentiment you collect or the results of your ads. All of this is unstructured, but valuable, and can be stored in a data lake and work with both your data store and your database.- Note 1: Having a data lake does not mean that you can simply load your data willy-nilly. This leads to a huge amount of data, but at the same time, it simplifies the process; and new technologies, such as the data catalog, will continually make it easier to find and use data in your data lake.
- Note 2: If you would like more information on the ideal data lake architecture; you can read the full article that we have written on this topic. It explains why you want your data lake to be built with Object Storage and Apache Spark, not Hadoop. MongoDB is an essential part of an efficient Data-Science UA