Last modified: 21.01.2021 By Linh Truong([email protected])
This homework is not graded.
The goal of this task is to help you to be familiar with dynamic provisioning of big data platform components using cloud technologies. We choose MongoDB as one component for this task, as it does not require a huge effort to deploy and test it. MongoDB is a common NoSQL database. You can run MongoDB using Docker container. We can also run a replica set of MongoDB using Docker Compose.
- Setup docker and get MongoDB docker image
- Deploy a MongoDB instance using Docker
- Write a program with three functions: (i) test if an MongoDB instance is running, (ii) kill/stop a MongoDB instance, and (iii) start a MongoDB instance
We do not assume that you master MongoDB. If you do not know MongoDB, it is still possible to practice the homework as it is mainly about managing services (for big data platforms). In our tutorial code, there are some parts dealing with MongoDB that you might take a look:
Assume that you take the data from Airbnb Dataset and combine it with crime data (e.g., from the government) for recommending accommodations. Which data concerns (e.g., accuracy, price, license) are important?
Consider that your big data platform must support the analysis of Avian Vocalizations from CA & NV, USA. Would you consider to use different types of data storages/databases, where each storage/database (e.g., database or file storage) would store only one type of data.
For storing the BTS data, should we partition data based on the station or the timestamp of the data?
Given the BTS monitoring, e.g. the BTS data, do you think we need to distribute data and analysis across multiple places?