Member-only story

How to Build a Data Lake With AWS

Derek Hutson
6 min readMar 7, 2023

Use S3, Glue, and Lake Formation.

Photo by Aaron Burden on Unsplash

Modern day data can be quite overwhelming when you think about the sheer scale of it. You can have data coming in many forms such as pictures, text files, streaming videos, code bases and more.

A common issue that businesses will face is how to actually store this data so it can be cleaned, processed, and analyzed.

Modern database solutions like SQL and NoSQL are great when your data has structure. But what happens when you are dealing with a substantial amount of unstructured data?

Enter Data Lakes.

A Data lake is a centralized repository where you can store all your structured and unstructured data at any scale.

These come in handy when you have a large amount of data you need to store, but you can’t process it all immediately.

Once your data is processed, you can implement different types of analytics such as dashboards and real-time analytics.

In this article, I’ll show you how to setup a simple data lake to introduce you to what it looks like and how it works.

  • As a disclaimer, when creating this setup I ran into quite a few errors and most of them were permissions related. So we will grant the IAM role being used less constrained…

--

--

Derek Hutson
Derek Hutson

Written by Derek Hutson

Practicing Kaizen in all things. Being a dad is pretty neat too.

No responses yet