Files
yo006yo/task1/quotation.md
louiscklaw a786e870b8 update,
2025-02-01 02:11:37 +08:00

2.0 KiB

tags
tags
pending
kaggle

task1

Brief

You are required to collect open data and real-time data.

Part 1: Jupyter notebook Data Analysis and suggestion of actionable items

  • Download the Top 200 common passwords by country 2021 database from www.kaggle.com
  • Manipulate and rearrange the data if necessary
  • Visualize the data using 8 or more charts using Python programming in Jupyter notebook.
  • The sunburst chart, heat map, and pair-plot must be used.
  • 1 or more 3D chart is essential.
  • 1 or more map, such as choropleth map in plotly should be displayed.
  • Analyze the charts (and data) which may reveal some facts to us.
  • Provide insights and suggest actionable items.
  • (You may add other related data set(s) to enrich your insights and suggestions.)

Part 2: Real-time data processing and visualization, in Jupyter notebook.

  • Collect and store real-time data using the API of HK Accident and Emergency waiting time (of Hospitals) in NoSQL database (e.g., MongoDB).
  • The data collection duration should be 3 or more days, within November and/or December.
  • The collection frequency should be every 15 minutes or less.
  • Create Jupyter Notebook to read data into a Pandas dataframe.
  • (You may export the data, using Mongo Compass, to a json file first.)
  • Process and visualize the data.
  • Produce 3 or more charts.
  • You are encouraged to use python 3D visualization techniques too.
  • Analyze the charts (and data) to reveal some facts.
  • Provide insights / comments / suggestions.

Items should include:

  • Exported collection(s) of the open data / samples of real-time data, from MongoDB
  • Jupyter Notebooks that visualize and analyze the data sets, with summary, conclusions and so on in Markdown.
  • Demonstrate data collection process and present all results / insights, in a video.
  • Upload everything to Moodle 1 week after the last lesson.