Skip to the content.

This case study is a comprehensive analysis of data crimes from 2001 to present using various data engineering tools and practices.

Table of Contents

About

The purpose of this project is to analyze the trends and patterns in crime data in Chicago.

Usage

To use this project, follow these steps:

Deployment

For deployment, we will be using [AWS/GCP] to scale the data processing tasks and handle large datasets.

Built Using

This project was built using various data engineering tools and practices, including:

* Airflow
* AWS/GCP for deployment
* DBT for database tasks
* PySpark and PyFlink for processing large datasets

Authors

Acknowledgements

Comparison of Practices

We have compared the performance, scalability, and ease of use of each tool in handling large datasets. This section provides a comprehensive overview of our findings.

Project Organization

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
  │   ├── external       <- Data from third party sources.
  │   ├── interim        <- Intermediate data that has been transformed.
  │   ├── processed      <- The final, canonical data sets for modeling.
  │   └── raw            <- The original, immutable data dump.
├── docs               <- A default Sphinx project; see sphinx-doc.org for details
│...
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
    ├── figures        <- Generated graphics and figures to be used in reporting
...
└── src                <- Source code for use in this project.

Development Updates

=====================

Task 1

I hope this helps! Let me know if you have any questions or need further assistance.