Description
Write a Python program WSD.py that implements the Naive Bayes algorithm for word sensedisambiguation, as discussed in class. Specifically, your program will have to assign a giventarget word with its correct sense in a number of test examples. You are not to use externallibraries such as pandas, scikit-learn, or NLTK.Please implement the Naive Bayes algorithm and cross-validation yourself, do notuse scikit-learn (or other machine learning library).You will train and test your program on a dataset consisting of textual examples for thenoun “plant,” drawn from the British National Corpus, where each example is manuallyannotated with its correct sense of “plant.” Consider for example the following instance:September 1991 1.30 You can win a great new patio Pippa Wood How to cope with a slopeBulbs plant now for spring bloomsThe target word is identified by the SGML tag , and the sense corresponding to thisparticular instance is that of plant%living.The dataset (plant.wsd) is available on Canvas in the Assignment 3 details. Programming guidelines:Your program should perform the following steps:1. Take one argument consisting of the name of one file, which includes the annotatedinstances.2. Determine from the entire file the total number of instances and the possible senselabels.3. Create five folds, for a five-fold cross-validation evaluation of your Naive Bayes WSDimplementation. Specifically, divide the total number of instances into five, round upto determine the number of instances in folds 1 through 4, and include theremaining instances in fold 5. For example, if you have 122 total instances, you willhave five folds with sizes 25, 25, 25, 25, and 22 respectively.4. Implement and run the Naive Bayes WSD algorithm using a five-fold cross-validationscheme. In each run, you will:a. Use one of the folds as your test data, and the remaining folds together asyour training data (e.g., in the first run, use fold 1 as test, and folds 2 through5 as training; etc.);b. Collect the counts you need from the training data, and use the Naive Bayesalgorithm to predict the senseid-s for the instances in the test data;c. Evaluate the performance of your system by comparing the predictions madeby your Naive Bayes word sense disambiguation system on the test data foldagainst the ground truth annotations (available as senseid-s in the test data).Considerations for the Naive Bayes implementation:1. All the words found in the context of the target word will represent the features tobe considered2. Address zero counts using add-one smoothing3. Work in log space to avoid underflow due to repeated multiplication of smallnumbersThe WSD.py program should be run using a command like this:% python WSD.py plant.wsd The program should produce at the standard output the accuracies of the system (as apercentage) for each of the five folds, as well as the average accuracy. It should alsogenerate a file called plant.wsd.out, which includes for each fold the id of the words in thetest file along with the senseid predicted by the system. Clearly delineate each fold with aline like this “Fold 1”, “Fold 2”, etc. For instance, the following are examples of lines drawnfrom a plant.wsd.out fileFold 1plant.1000000 plant%factoryplant.1000001 plant%factoryplant.1000002 plant%living…Fold 2plant.1000041 plant%livingplant.1000042 plant%living…Write-up guidelines:Create a text file called WSD.answers, and include the following information:1. How complete your program is. Even if your program is not complete or you aregetting compilation errors, you will get partial credit proportionally. Just mentionclearly and accurately how far you got.2. If your program is complete, a line consisting only of the name of the dataset:plant.wsd3. If your program is complete, the accuracies of your Naive Bayes system for each ofthe five folds, as well as the average accuracy.4. If your program is complete, identify three errors in the automatically tagged sensedata, and analyze them (i.e., for each error, write one brief sentence describing thepossible reason for the error and how it could be fixed)give me the code fulfilling all the mentioned conditions and make it plagiarism free and no ai generated code and explain the code in detail
Mike Snyder –
Great Work In Timely Fashion. Something what I wanted.
Glenn Mccall –
great work as usual thanks
Pete Hansen –
Excellent job, my friend. Thank You!!