Day 8 - Python

Problem Statement: Word Frequency Analyzer

You are given a piece of text containing words and punctuation. Your task is to create a Python program that takes this text as input and outputs a DataFrame containing each word in the text along with its frequency, sorted first by frequency in descending order and then by word in ascending order.

Implement the function count_word_frequency(txt) which takes a string txt as input and returns a dictionary containing the frequency of each word in the text.

Next, sort the words based on their frequencies and then alphabetically, and store the results in a Pandas DataFrame.

Your task is to complete the given Python code to accomplish the above task.

Input Format:

A single line of text containing words and punctuation.

Output Format:

Display the words and their frequencies in a tabular format (DataFrame) with two columns: 'Word' and 'Frequency'.

The DataFrame should be sorted first by frequency in descending order and then by word in ascending order.

Do not display the index.

Constraints:

Words are case-insensitive.

Punctuation should be ignored.

Example:

Input:

Hello, world! This is a test. Is this a good test?

Output:

Word Frequency

a 2

is 2

test 2

this 2

good 1

hello 1

world 1

Note:

In the example, words like "This" and "this" are considered the same, and punctuation is ignored. The words are sorted first by frequency and then alphabetically.

SOLUTION:

import string

import pandas as pd

def count_word_frequency(txt):

txt = txt.translate(str.maketrans('', '', string.punctuation)).lower()

word_frequency = {}

words = txt.split()

for i in words:

word_frequency[i] = word_frequency.get(i, 0) + 1

return word_frequency

text = input().strip()

word_frequency = count_word_frequency(text)

result = sorted(word_frequency.items(), key=lambda x: (-x[1], x[0]))

df = pd.DataFrame(result, columns=['Word', 'Frequency'])

print(df.to_string(index=False))

Insights:

The function count_word_frequency(txt) takes a text input, removes punctuation and converts all words to lowercase, then counts the frequency of each word.
The translate() function is used to remove punctuation from the text. It utilizes the str.maketrans() method to create a translation table where punctuation characters are mapped to None.
The code converts all words to lowercase before counting their frequencies. This ensures that words like "This" and "this" are treated as the same word.
The frequency of each word is stored in a dictionary called word_frequency, where the key is the word and the value is the count of occurrences.
The code splits the text into words using whitespace as a delimiter and iterates over each word. Inside the loop, the code updates the word_frequency dictionary by incrementing the count for each word encountered.
After counting frequencies, the code sorts the dictionary items based on frequency and then alphabetically using a lambda function as the sorting key.
The sorted word frequency dictionary is converted into a Pandas DataFrame with columns 'Word' and 'Frequency'. The DataFrame is printed to the console using print(df.to_string(index=False)), which ensures that the index is not displayed.

Happy Coding! :)

Master Coding from Scratch

Search This Blog

Day 8 - Python

Comments

Post a Comment