• <button id="ecio8"></button>
  • <li id="ecio8"></li>
    <s id="ecio8"></s>
    <dl id="ecio8"></dl>
    <center id="ecio8"><noscript id="ecio8"></noscript></center>
    • <table id="ecio8"><source id="ecio8"></source></table>
      <bdo id="ecio8"></bdo>
    • <s id="ecio8"></s>

      IEMS 5730代做、c++,Java語言編程代寫

      時間:2024-03-12  來源:  作者: 我要糾錯



      IEMS 5730 Spring 2024 Homework 2
      Release date: Feb 23, 2024
      Due date: Mar 11, 2024 (Monday) 11:59:00 pm
      We will discuss the solution soon after the deadline. No late homework will be accepted!
      Every Student MUST include the following statement, together with his/her signature in the
      submitted homework.
      I declare that the assignment submitted on Elearning system is original
      except for source material explicitly acknowledged, and that the same or
      related material has not been previously submitted for another course. I
      also acknowledge that I am aware of University policy and regulations on
      honesty in academic work, and of the disciplinary guidelines and
      procedures applicable to breaches of such policy and regulations, as
      contained in the website
      http://www.cuhk.edu.hk/policy/academichonesty/.
      Signed (Student_________________________) Date:______________________________
      Name_________________________________ SID_______________________________
      Submission notice:
      ● Submit your homework via the elearning system.
      ● All students are required to submit this assignment.
      General homework policies:
      A student may discuss the problems with others. However, the work a student turns in must
      be created COMPLETELY by oneself ALONE. A student may not share ANY written work or
      pictures, nor may one copy answers from any source other than one’s own brain.
      Each student MUST LIST on the homework paper the name of every person he/she has
      discussed or worked with. If the answer includes content from any other source, the
      student MUST STATE THE SOURCE. Failure to do so is cheating and will result in
      sanctions. Copying answers from someone else is cheating even if one lists their name(s) on
      the homework.
      If there is information you need to solve a problem, but the information is not stated in the
      problem, try to find the data somewhere. If you cannot find it, state what data you need,
      make a reasonable estimate of its value, and justify any assumptions you make. You will be
      graded not only on whether your answer is correct, but also on whether you have done an
      intelligent analysis.
      Submit your output, explanation, and your commands/ scripts in one SINGLE pdf file.
      Q1 [20 marks + 5 Bonus marks]: Basic Operations of Pig
      You are required to perform some simple analysis using Pig on the n-grams dataset of
      Google books. An ‘n-gram’ is a phrase with n words. The dataset lists all n-grams present in
      books from books.google.com along with some statistics.
      In this question, you only use the Google books bigram (1-grams). Please go to Reference
      [1] and [2] to download the two datasets. Each line in these two files has the following format
      (TAB separated):
      bigram year match_count volume_count
      An example for 1-grams would be:
      circumvallate 1978 335 91
      circumvallate 1979 261 95
      This means that in 1978(1979), the word "circumvallate" occurred 335(261) times overall,
      from 91(95) distinct books.
      (a) [Bonus 5 marks] Install Pig in your Hadoop cluster. You can reuse your Hadoop
      cluster in IEMS 5730 HW#0 and refer to the following link to install Pig 0.17.0 over
      the master node of your Hadoop cluster :
      http://pig.apache.org/docs/r0.17.0/start.html#Pig+Setup
      Submit the screenshot(s) of your installation process.
      If you choose not to do the bonus question in (a), you can use any well-installed Hadoop
      cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7]
      to complete the following parts of the question:
      (b) [5 marks] Upload these two files to HDFS and join them into one table.
      (c) [5 marks] For each unique bigram, compute its average number of occurrences per
      year. In the above example, the result is:
      circumvallate (335 + 261) / 2 = 298
      Notes: The denominator is the number of years in which that word has appeared.
      Assume the data set contains all the 1-grams in the last 100 years, and the above
      records are the only records for the word ‘circumvallate’. Then the average value is:
      (335 + 261) / 2 = 298,
      instead of
      (335 + 261) / 100 = 5.96
      (d) [10 marks] Output the 20 bigrams with the highest average number of occurrences
      per year along with their corresponding average values sorted in descending order. If
      multiple bigrams have the same average value, write down anyone you like (that is,
      break ties as you wish).
      You need to write a Pig script to perform this task and save the output into HDFS.
      Hints:
      ● This problem is very similar to the word counting example shown in the lecture notes
      of Pig. You can use the code there and just make some minor changes to perform
      this task.
      Q2 [20 marks + 5 bonus marks]: Basic Operations of Hive
      In this question, you are asked to repeat Q1 using Hive and then compare the performance
      between Hive and Pig.
      (a) [Bonus 5 marks] Install Hive on top of your own Hadoop cluster. You can reuse your
      Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Hive
      2.3.8 over the master node of your Hadoop cluster.
      https://cwiki.apache.org/confluence/display/Hive/GettingStarted
      Submit the screenshot(s) of your installation process.
      If you choose not to do the bonus question in (a), you can use any well-installed Hadoop
      cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7].
      (b) [20 marks] Write a Hive script to perform exactly the same task as that of Q1 with
      the same datasets stored in the HDFS. Rerun the Pig script in this cluster and
      compare the performance between Pig and Hive in terms of overall run-time and
      explain your observation.
      Hints:
      ● Hive will store its tables on HDFS and those locations needs to be bootstrapped:
      $ hdfs dfs -mkdir /tmp
      $ hdfs dfs -mkdir /user/hive/warehouse
      $ hdfs dfs -chmod g+w /tmp
      $ hdfs dfs -chmod g+w /user/hive/warehouse
      ● While working with the interactive shell (or otherwise), you should first test on a small
      subset of the data instead of the whole data set. Once your Hive commands/ scripts
      work as desired, you can then run them up on the complete data set.
      Q3 [30 marks + 10 Bonus marks]: Similar Users Detection in
      the MovieLens Dataset using Pig
      Similar user detection has drawn lots of attention in the machine learning field which is
      aimed at grouping users with similar interests, behaviors, actions, or general patterns. In this
      homework, you will implement a similar-users-detection algorithm for the online movie rating
      system. Basically, users who rate similar scores for the same movies may have common
      tastes or interests and be grouped as similar users.
      To detect similar users, we need to calculate the similarity between each user pair. In this
      homework, the similarity between a given pair of users (e.g. A and B) is measured as the
      total number of movies both A and B have watched divided by the total number of
      movies watched by either A or B. The following is the formal definition of similarity: Let
      M(A) be the set of all the movies user A has watched. Then the similarity between user A
      and user B is defined as:
      ………..(**) 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐴, 𝐵) =
      |𝑀(𝐴)∩𝑀(𝐵)|
      |𝑀(𝐴)∪𝑀(𝐵)|
      where |S| means the cardinality of set S.
      (Note: if |𝑀(𝐴)∪𝑀(𝐵)| = 0, we set the similarity to be 0.)
      The following figure illustrates the idea:
      Two datasets [3][4] with different sizes are provided by MovieLens. Each user is represented
      by its unique userID and each movie is represented by its unique movieID. The format of the
      data set is as follows:
      <userID>, <movieID>
      Write a program in Pig to detect the TOP K similar users for each user. You can use the
      cluster you built for Q1 and Q2 or you can use the IE DIC or one provided by the Google
      Cloud/AWS [5, 6, 7].
      (a) [10 marks] For each pair of users in the dataset [3] and [4], output the number of
      movies they have both watched.
      For your homework submission, you need to submit i) the Pig script and ii) the
      list of the 10 pairs of users having the largest number of movies watched by
      both users in the pair within the corresponding dataset. The format of your
      answer should be as follows:
      請加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

      標(biāo)簽:

      掃一掃在手機(jī)打開當(dāng)前頁
    • 上一篇:COMP 315代寫、Java程序語言代做
    • 下一篇:代做CSCI 2525、c/c++,Java程序語言代寫
    • 無相關(guān)信息
      昆明生活資訊

      昆明圖文信息
      蝴蝶泉(4A)-大理旅游
      蝴蝶泉(4A)-大理旅游
      油炸竹蟲
      油炸竹蟲
      酸筍煮魚(雞)
      酸筍煮魚(雞)
      竹筒飯
      竹筒飯
      香茅草烤魚
      香茅草烤魚
      檸檬烤魚
      檸檬烤魚
      昆明西山國家級風(fēng)景名勝區(qū)
      昆明西山國家級風(fēng)景名勝區(qū)
      昆明旅游索道攻略
      昆明旅游索道攻略
    • 福建中專招生網(wǎng) NBA直播 短信驗(yàn)證碼平臺 幣安官網(wǎng)下載 WPS下載

      關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

      Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網(wǎng) 版權(quán)所有
      ICP備06013414號-3 公安備 42010502001045

      欧美成人免费全部观看天天性色,欧美日韩视频一区三区二区,欧洲美女与动性zozozo,久久久国产99久久国产一
    • <button id="ecio8"></button>
    • <li id="ecio8"></li>
      <s id="ecio8"></s>
      <dl id="ecio8"></dl>
      <center id="ecio8"><noscript id="ecio8"></noscript></center>
      • <table id="ecio8"><source id="ecio8"></source></table>
        <bdo id="ecio8"></bdo>
      • <s id="ecio8"></s>
        主站蜘蛛池模板: 国产又猛又黄又爽| 欧美日韩亚洲第一页| 月夜直播在线看片www| 国产精品老女人精品视| 亚洲精品tv久久久久久久久| www.伊人久久| 超级乱淫岳最新章节目录| 日韩免费无砖专区2020狼| 国产成人亚洲精品大帝| 又色又爽又黄的视频毛片| 亚洲一区在线视频观看| аⅴ中文在线天堂| 蜜桃视频在线观看免费网址入口| 欧美大香线蕉线伊人图片| 女人是男人的未来1分29| 国产亚洲美女精品久久久| 亚洲中文字幕久久精品无码va | 国产一区二区三区日韩欧美| 久久久久亚洲av无码尤物| fc2ppv在线观看| 最近中文字幕mv免费高清视频7| 在线国产小视频| 午夜爽爽试看5分钟| 久久久精品日本一区二区三区| 日本在线xxxx| 沉伦柳淑云漫画3d| 国产精品白丝喷水在线观看| 亚洲视频在线看| 一级特黄录像绵费播放| 青青国产在线视频| 日韩有码第一页| 国产精品99久久久精品无码| 亚洲第一视频网站| 18禁无遮拦无码国产在线播放| 最近免费中文字幕大全高清10| 国产免费女女脚奴视频网| 久久精品国产精品亚洲精品 | 久久精品aⅴ无码中文字字幕| aⅴ在线免费观看| 日韩a无v码在线播放| 国产男女爽爽爽爽爽免费视频|