Data pipeline is a concept for data being brought from one end to another, usually with some modifications along the way. It all comes down to a set of instructions, that extract data or push it from one end to another, only to reach its goal. Sometimes data is aggregated on one location and when the time is right, the triggers launch the process of data extraction.

All the processes are automated. Merging data from a database to another, moving and correcting data, adding, dividing, averaging, everything.

Making automated jobs is almost synonymous to making a data pipeline. Not only does this accelerate the data flow process, but also allows us to build abstraction layers. Automation also helps us to filter out important data, and deliver it in the right format.

alt text

Handling data flow is not an easy task and does involve quite a bit of software. It is also important to build it in a way, that it will be able to handle the given load and deliver correct data to the customer.

The key objective is to create a system that will connect data generators, data consumers and provide an interface to actuators. While doing that job, it will also include storage optimization and actuator utilization.

The project consists of:

The reasons for making the project are several:

These are questions that have multi-level solutions and need a lot of analytics and further investigation to answer them.

It is obvious that the data flow for this project has an opportunity to involve several machine learning systems, and my job is to dive into the data and expose some of the facts, we can benefit from. Data party coming up!

Photo by Moritz Mentges on Unsplash