Day 8 - Python

Problem Statement: Word Frequency Analyzer

You are given a piece of text containing words and punctuation. Your task is to create a Python program that takes this text as input and outputs a DataFrame containing each word in the text along with its frequency, sorted first by frequency in descending order and then by word in ascending order.

Implement the function count_word_frequency(txt) which takes a string txt as input and returns a dictionary containing the frequency of each word in the text.

Next, sort the words based on their frequencies and then alphabetically, and store the results in a Pandas DataFrame.

Your task is to complete the given Python code to accomplish the above task.

Input Format:

A single line of text containing words and punctuation.

Output Format:

Display the words and their frequencies in a tabular format (DataFrame) with two columns: 'Word' and 'Frequency'.

The DataFrame should be sorted first by frequency in descending order and then by word in ascending order.

Do not display the index.

Constraints:

Words are case-insensitive.

Punctuation should be ignored.

Example:

Input:

Hello, world! This is a test. Is this a good test?

Output:

 Word  Frequency

a            2

is           2

test        2

this        2

good      1

hello      1

world     1

Note:

In the example, words like "This" and "this" are considered the same, and punctuation is ignored. The words are sorted first by frequency and then alphabetically.


SOLUTION:

import string

import pandas as pd

def count_word_frequency(txt):

    txt = txt.translate(str.maketrans('', '', string.punctuation)).lower()

    word_frequency = {}

    words = txt.split()

    for i in words:

        word_frequency[i] = word_frequency.get(i, 0) + 1

    return word_frequency


text = input().strip()

word_frequency = count_word_frequency(text)

result = sorted(word_frequency.items(), key=lambda x: (-x[1], x[0]))

df = pd.DataFrame(result, columns=['Word', 'Frequency'])

print(df.to_string(index=False))


Insights:

  • The function count_word_frequency(txt) takes a text input, removes punctuation and converts all words to lowercase, then counts the frequency of each word.
  • The translate() function is used to remove punctuation from the text. It utilizes the str.maketrans() method to create a translation table where punctuation characters are mapped to None.
  • The code converts all words to lowercase before counting their frequencies. This ensures that words like "This" and "this" are treated as the same word.
  • The frequency of each word is stored in a dictionary called word_frequency, where the key is the word and the value is the count of occurrences.
  • The code splits the text into words using whitespace as a delimiter and iterates over each word. Inside the loop, the code updates the word_frequency dictionary by incrementing the count for each word encountered.
  • After counting frequencies, the code sorts the dictionary items based on frequency and then alphabetically using a lambda function as the sorting key.
  • The sorted word frequency dictionary is converted into a Pandas DataFrame with columns 'Word' and 'Frequency'. The DataFrame is printed to the console using print(df.to_string(index=False)), which ensures that the index is not displayed.
Happy Coding! :)


Comments