What is Data?
Before jumping to Data Engineering, let's start from what actually is Data. Well in today's world data is everything and everything is data. While scrolling social media, playing games online, surfing internet etc., terabytes of data are being processed and stored. Data nowadays are asset to people, So these assets need to be well managed for taking more benefits as we can.Data Lifecycle
If we take a look at impact of data in every gadget and tech we use, we never actually think how the data is reaching us in such form and from where? Data goes through many processes for it to be used in real world some data are used as it is created like in real time system whereas some are used better with certain time intervals.
First data is collected from various sources then it is ingested to the pipeline and then it is stored . The data is the processed i.e. converted to the form as desired by the users and used for analysis. The processing storing and analyzing steps are iterative depending upon the needs of the company and business requirements.
fig. Diagram to visualize the cycle.
Data Engineering
Well, if we describe data engineering in simple terms, it is a discipline within data management that focuses specially in designing, constructing, and maintaining the systems and infrastructure necessary for the collection, storage, processing, and analysis of large volumes of data. Well as i already mentioned this is period of data or should i say big data, data engineers should be able to handle not less terabytes of data, and as per requirements we have different tools to complete the task.
some typical task involved in data engineering are,
- Data Ingestion
- Data Transformation
- Data Storage
- Data Processing
- Data Quality
- Performance optimization
- Data Security
Its quiet amazing to see how data engineers can play vital role in deriving insights form data. Data Engineers use various tools and technique to develop simple to complex pipelines of data.
Tools
- AWS Glue
- AWS Kinesis
- Apache Hadoop
- Apache Airflow
- Apache Kafka
- Apache Spark
- Google Cloud
- Snowflake
- Microsoft Azure Data Factory
- MySQL
Conclusion
Overall, we can say how data can influence our day-to-day life and how we depend on them. Imagine us getting weather data of each and every country on your mobile screen and only getting filtered weather data of your particular area, which one is more convenient, well this is small example of how handling data in correct way is so important. Well all of these generally come under Data Engineering.
Comments
Post a Comment