Problem Statement: Word Frequency Analyzer
You are given a piece of text containing words and punctuation. Your task is to create a Python program that takes this text as input and outputs a DataFrame containing each word in the text along with its frequency, sorted first by frequency in descending order and then by word in ascending order.
Implement the function count_word_frequency(txt) which takes a string txt as input and returns a dictionary containing the frequency of each word in the text.
Next, sort the words based on their frequencies and then alphabetically, and store the results in a Pandas DataFrame.
Your task is to complete the given Python code to accomplish the above task.
Input Format:
A single line of text containing words and punctuation.
Output Format:
Display the words and their frequencies in a tabular format (DataFrame) with two columns: 'Word' and 'Frequency'.
The DataFrame should be sorted first by frequency in descending order and then by word in ascending order.
Do not display the index.
Constraints:
Words are case-insensitive.
Punctuation should be ignored.
Example:
Input:
Hello, world! This is a test. Is this a good test?
Output:
Word Frequency
a 2
is 2
test 2
this 2
good 1
hello 1
world 1
Note:
In the example, words like "This" and "this" are considered the same, and punctuation is ignored. The words are sorted first by frequency and then alphabetically.
SOLUTION:
import string
import pandas as pd
def count_word_frequency(txt):
txt = txt.translate(str.maketrans('', '', string.punctuation)).lower()
word_frequency = {}
words = txt.split()
for i in words:
word_frequency[i] = word_frequency.get(i, 0) + 1
return word_frequency
text = input().strip()
word_frequency = count_word_frequency(text)
result = sorted(word_frequency.items(), key=lambda x: (-x[1], x[0]))
df = pd.DataFrame(result, columns=['Word', 'Frequency'])
print(df.to_string(index=False))
Insights:
- The function count_word_frequency(txt) takes a text input, removes punctuation and converts all words to lowercase, then counts the frequency of each word.
- The translate() function is used to remove punctuation from the text. It utilizes the str.maketrans() method to create a translation table where punctuation characters are mapped to None.
- The code converts all words to lowercase before counting their frequencies. This ensures that words like "This" and "this" are treated as the same word.
- The frequency of each word is stored in a dictionary called word_frequency, where the key is the word and the value is the count of occurrences.
- The code splits the text into words using whitespace as a delimiter and iterates over each word. Inside the loop, the code updates the word_frequency dictionary by incrementing the count for each word encountered.
- After counting frequencies, the code sorts the dictionary items based on frequency and then alphabetically using a lambda function as the sorting key.
- The sorted word frequency dictionary is converted into a Pandas DataFrame with columns 'Word' and 'Frequency'. The DataFrame is printed to the console using print(df.to_string(index=False)), which ensures that the index is not displayed.
Comments
Post a Comment