• <button id="ecio8"></button>
  • <li id="ecio8"></li>
    <s id="ecio8"></s>
    <dl id="ecio8"></dl>
    <center id="ecio8"><noscript id="ecio8"></noscript></center>
    • <table id="ecio8"><source id="ecio8"></source></table>
      <bdo id="ecio8"></bdo>
    • <s id="ecio8"></s>

      CS5012代做、代寫Python設(shè)計(jì)程序

      時(shí)間:2024-03-03  來(lái)源:  作者: 我要糾錯(cuò)



      CS5012 Mark-Jan Nederhof Practical 1
      Practical 1: Part of speech tagging:
      three algorithms
      This practical is worth 50% of the coursework component of this module. Its due
      date is Wednesday 6th of March 2024, at 21:00. Note that MMS is the definitive source
      for deadlines and weights.
      The purpose of this assignment is to gain understanding of the Viterbi algorithm,
      and its application to part-of-speech (POS) tagging. The Viterbi algorithm will be
      related to two other algorithms.
      You will also get to see the Universal Dependencies treebanks. The main purpose
      of these treebanks is dependency parsing (to be discussed later in the module), but
      here we only use their part-of-speech tags.
      Getting started
      We will be using Python3. On the lab (Linux) machines, you need the full path
      /usr/local/python/bin/python3, which is set up to work with NLTK. (Plain
      python3 won’t be able to find NLTK.)
      If you run Python on your personal laptop, then next to NLTK (https://www.
      nltk.org/), you will also need to install the conllu package (https://pypi.org/
      project/conllu/).
      To help you get started, download gettingstarted.py and the other Python
      files, and the zip file with treebanks from this directory. After unzipping, run
      /usr/local/python/bin/python3 gettingstarted.py. You may, but need not, use
      parts of the provided code in your submission.
      The three treebanks come from Universal Dependencies. If you are interested,
      you can download the entire set of treebanks from https://universaldependencies.
      org/.
      1
      Parameter estimation
      First, we write code to estimate the transition probabilities and the emission probabilities of an HMM (Hidden Markov Model), on the basis of (tagged) sentences from
      a training corpus from Universal Dependencies. Do not forget to involve the start-ofsentence marker ⟨s⟩ and the end-of-sentence marker ⟨/s⟩ in the estimation.
      The code in this part is concerned with:
      • counting occurrences of one part of speech following another in a training corpus,
      • counting occurrences of words together with parts of speech in a training corpus,
      • relative frequency estimation with smoothing.
      As discussed in the lectures, smoothing is necessary to avoid zero probabilities for
      events that were not witnessed in the training corpus. Rather than implementing a
      form of smoothing yourself, you can for this assignment take the implementation of
      Witten-Bell smoothing in NLTK (among the implementations of smoothing in NLTK,
      this seems to be the most robust one). An example of use for emission probabilities is
      in file smoothing.py; one can similarly apply smoothing to transition probabilities.
      Three algorithms for POS tagging
      Algorithm 1: eager algorithm
      First, we implement a naive algorithm that chooses the POS tag for the i-th token
      on the basis of the chosen (i − 1)-th tag and the i-th token. To be more precise, we
      determine for each i = 1, . . . , n, in this order:
      tˆi = argmax
      ti
      P(ti
      | tˆi−1) · P(wi
      | ti)
      assuming tˆ0 is the start-of-sentence marker ⟨s⟩. Note that the end-of-sentence marker
      ⟨/s⟩ is not even used here.
      Algorithm 2: Viterbi algorithm
      Now we implement the Viterbi algorithm, which determines the sequence of tags for a
      given sentence that has the highest probability. As discussed in the lectures, this is:
      tˆ1 · · ·tˆn = argmax
      t1···tn
       Yn
      i=1
      P(ti
      | ti−1) · P(wi
      | ti)
      !
      · P(tn+1 | tn)
      2
      where the tokens of the input sentence are w1 · · ·wn, and t0 = ⟨s⟩ and tn+1 = ⟨/s⟩ are
      the start-of-sentence and end-of-sentence markers, respectively.
      To avoid underflow for long sentences, we need to use log probabilities.
      Algorithm 3: individually most probable tags
      We now write code that determines the most probable part of speech for each token
      individually. That is, for each i, computed is:
      tˆi = argmax
      ti
      X
      t1···ti−1ti+1···tn
       Yn
      i=1
      P(ti
      | ti−1) · P(wi
      | ti)
      !
      · P(tn+1 | tn)
      To compute this effectively, we need to use forward and backward values, as discussed
      in the lectures on the Baum-Welch algorithm, making use of the fact that the above is
      equivalent to:
      tˆi = argmax
      ti
      P
      t1···ti−1
      Qi
      k=1 P(tk | tk−1) · P(wk | tk)
      
      ·
      P
      ti+1···tn
      Qn
      k=i+1 P(tk | tk−1) · P(wk | tk)
      
      · P(tn+1 | tn)
      The computation of forward values is very similar to the Viterbi algorithm, so you
      may want to copy and change the code you already had, replacing statements that
      maximise by corresponding statements that sum values together. Computation of
      backward values is similar to computation of forward values.
      See logsumexptrick.py for a demonstration of the use of log probabilities when
      probabilities are summed, without getting underflow in the conversion from log probabilities to probabilities and back.
      Evaluation
      Next, we write code to determine the percentages of tags in a test corpus that are
      guessed correctly by the above three algorithms. Run experiments for the training
      and test corpora of the three included treebanks, and possibly for treebanks of more
      languages (but not for more than 5; aim for quality rather than quantity). Compare
      the performance of the three algorithms.
      You get the best experience out of this practical if you also consider the languages of
      the treebanks. What do you know (or what can you find out) about the morphological
      and syntactic properties of these languages? Can you explain why POS tagging is more
      difficult for some languages than for others?
      3
      Requirements
      Submit your Python code and the report.
      It should be possible to run your implementation of the three algorithms on the
      three corpora simply by calling from the command line:
      python3 p1.py
      You may add further functionality, but then add a README file to explain how to run
      that functionality. You should include the three treebanks needed to run the code, but
      please do not include the entire set of hundreds of treebanks from Universal
      Dependencies, because this would be a huge waste of disk space and band
      width for the marker.
      Marking is in line with the General Mark Descriptors (see pointers below). Evidence of an acceptable attempt (up to 7 marks) could be code that is not functional but
      nonetheless demonstrates some understanding of POS tagging. Evidence of a reasonable attempt (up to 10 marks) could be code that implements Algorithm 1. Evidence
      of a competent attempt addressing most requirements (up to 13 marks) could be fully
      correct code in good style, implementing Algorithms 1 and 2 and a brief report. Evidence of a good attempt meeting nearly all requirements (up to 16 marks) could be
      a good implementation of Algorithms 1 and 2, plus an informative report discussing
      meaningful experiments. Evidence of an excellent attempt with no significant defects
      (up to 18 marks) requires an excellent implementation of all three algorithms, and a
      report that discusses thorough experiments and analysis of inherent properties of the
      algorithms, as well as awareness of linguistic background discussed in the lectures. An
      exceptional achievement (up to 20 marks) in addition requires exceptional understanding of the subject matter, evidenced by experiments, their analysis and reflection in
      the report.
      Hints
      Even though this module is not about programming per se, a good programming style
      is expected. Choose meaningful variable and function names. Break up your code into
      small functions. Avoid cryptic code, and add code commenting where it is necessary for
      the reader to understand what is going on. Do not overengineer your code; a relatively
      simple task deserves a relatively simple implementation.
      You cannot use any of the POS taggers already implemented in NLTK. However,
      you may use general utility functions in NLTK such as ngrams from nltk.util, and
      FreqDist and WittenBellProbDist from nltk.
      4
      When you are reporting the outcome of experiments, the foremost requirement is
      reproducibility. So if you give figures or graphs in your report, explain precisely what
      you did, and how, to obtain those results.
      Considering current class sizes, please be kind to your marker, by making their task
      as smooth as possible:
      • Go for quality rather than quantity. We are looking for evidence of understanding
      rather than for lots of busywork. Especially understanding of language and how
      language works from the perpective of the HMM model is what this practical
      should be about.
      • Avoid Python virtual environments. These blow up the size of the files that
      markers need to download. If you feel the need for Python virtual environments,
      then you are probably overdoing it, and mistake this practical for a software
      engineering project, which it most definitely is not. The code that you upload
      would typically consist of three or four .py files.
      • You could use standard packages such as numpy or pandas, which the marker will
      likely have installed already, but avoid anything more exotic. Assume a version
      of Python3 that is the one on the lab machines or older; the marker may not
      have installed the latest bleeding-edge version yet.
      • We strongly advise against letting the report exceed 10 pages. We do not expect
      an essay on NLP or the history of the Viterbi algorithm, or anything of the sort.
      • It is fine to include a couple of graphs and tables in the report, but don’t overdo
      it. Plotting accuracy against any conceivable hyperparameter, just for the sake
      of producing lots of pretty pictures, is not what we are after.
      請(qǐng)加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

      標(biāo)簽:

      掃一掃在手機(jī)打開當(dāng)前頁(yè)
    • 上一篇:代做CS252編程、代寫C++設(shè)計(jì)程序
    • 下一篇:AcF633代做、Python設(shè)計(jì)編程代寫
    • 無(wú)相關(guān)信息
      昆明生活資訊

      昆明圖文信息
      蝴蝶泉(4A)-大理旅游
      蝴蝶泉(4A)-大理旅游
      油炸竹蟲
      油炸竹蟲
      酸筍煮魚(雞)
      酸筍煮魚(雞)
      竹筒飯
      竹筒飯
      香茅草烤魚
      香茅草烤魚
      檸檬烤魚
      檸檬烤魚
      昆明西山國(guó)家級(jí)風(fēng)景名勝區(qū)
      昆明西山國(guó)家級(jí)風(fēng)景名勝區(qū)
      昆明旅游索道攻略
      昆明旅游索道攻略
    • 高仿包包訂製

      關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

      Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網(wǎng) 版權(quán)所有
      ICP備06013414號(hào)-3 公安備 42010502001045

      欧美成人免费全部观看天天性色,欧美日韩视频一区三区二区,欧洲美女与动性zozozo,久久久国产99久久国产一
    • <button id="ecio8"></button>
    • <li id="ecio8"></li>
      <s id="ecio8"></s>
      <dl id="ecio8"></dl>
      <center id="ecio8"><noscript id="ecio8"></noscript></center>
      • <table id="ecio8"><source id="ecio8"></source></table>
        <bdo id="ecio8"></bdo>
      • <s id="ecio8"></s>
        主站蜘蛛池模板: 亚洲成在人线在线播放无码| 中文字幕丝袜诱惑| 色综合久久精品中文字幕首页| 黑寡妇被绿巨人擦gif图| 浪荡女天天不停挨cao日常视频 | 国产女人高潮抽搐喷水免费视频| 国产无人区卡一卡二卡三网站| 亚洲色婷婷一区二区三区| 中文字幕丰满伦子无码| 亚洲娇小性xxxx色| 欧美日韩小视频| 天天躁日日躁狠狠久久| 啊轻点灬大ji巴太粗太长了欧美| 久久综合色综合| 色狠狠一区二区三区香蕉蜜桃| 波多野结衣同性系列698| 国模视频一区二区| 便器调教(肉体狂乱)小说| 中国熟妇xxxx| 色一情一乱一伦一区二区三区日本 | 国产日产在线观看| 亚洲午夜久久久精品影院| 中文字幕伊人久久网| 色噜噜狠狠一区二区三区果冻| 挺进男同的屁股眼o漫画| 国产在线精品美女观看| 亚洲一区二区三区四区视频| 337p人体欧洲人体亚| 久久a级毛片免费观看| 福利视频导航网站| 欧美黑人又粗又大又爽免费| 天天干天天拍天天射| 免费a级午夜绝情美女视频| 一级毛片在线播放| 综合激情区视频一区视频二区| 日日碰狠狠添天天爽超碰97| 国产乱子伦精品视频| 久久午夜宫电影网| 青草青草久热精品视频在线观看| 日韩经典在线观看| 国产婷婷色一区二区三区深爱网|