Data Lake

A data lake is a system or repository of data stored in its natural format, usually object blobs or files. A data lake is usually a single store of all enterprise data including raw copies of source system data and transformed data used for tasks such as reporting, visualization, analytics and machine learning.A data lake can include structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and binary data (images, audio, video).

Sample Architecture

Cinque Terre

Hadoop Data Lake
A Hadoop data lake is a data management platform comprising one or more Hadoop clusters. It is used principally to process and store nonrelational data, such as log files, internet clickstream records, sensor data, JSON objects, images and social media posts. Such systems can also hold transactional data pulled from relational databases, but they're designed to support analytics applications, not to handle transaction processing. As public cloud platforms have become common sites for data storage, many people build Hadoop data lakes in the cloud.

Azure Data Lake
Azure Data Lake includes all of the capabilities required to make it easy for developers, data scientists and analysts to store data of any size and shape and at any speed, and do all types of processing and analytics across platforms and languages. It removes the complexities of ingesting and storing all your data while making it faster to get up and running with batch, streaming and interactive analytics. Azure Data Lake works with existing IT investments for identity, management and security for simplified data management and governance. It also integrates seamlessly with operational stores and data warehouses so that you can extend current data applications. We've drawn on the experience of working with enterprise customers and running some of the largest-scale processing and analytics in the world for Microsoft businesses such as Office 365, Xbox Live, Azure, Windows, Bing and Skype. Azure Data Lake solves many of the productivity and scalability challenges that prevent you from maximising the value of your data assets with a service that's ready to meet your current and future business needs.

AWS Data Lake
Once data is ready for the cloud, AWS makes it easy to store data in any format, securely, and at massive scale with Amazon S3 and Amazon Glacier. To make it easy for end users to discover the relevant data to use in their analysis, AWS Glue automatically creates a single catalog that is searchable, and queryable by users.