192. Word Frequency

原創

2019-06-12 00:27

192. Word Frequency (leetcode)
統計單詞頻率
Write a bash script to calculate the frequency of each word in a text file words.txt.
寫一個腳本統計文本文件words.txt中每個單詞的頻率

For simplicity sake, you may assume:
爲了簡化操作，你可以假設：
words.txt contains only lowercase characters and space ’ ’ characters.
words.txt文件只包含小寫字母和空格
Each word must consist of lowercase characters only.
每個單詞由小寫字母組成
Words are separated by one or more whitespace characters.
單詞之間由一個或者多個空格分開

Example:
Assume that words.txt has the following content:
words.txt 內容如下：

the day is sunny the the
the sunny is is

Your script should output the following, sorted by descending frequency:
腳本按照單詞頻率逆序輸出，內容如下：

 the 4
 is 3
 sunny 2
 day 1

Note:
備註
Don’t worry about handling ties, it is guaranteed that each word’s
frequency count is unique.
不用擔心處理關係，因爲每個的單詞的頻率是唯一的.
Could you write it in one-line using Unix pipes?
你能用Unix 管道技術，通過一行代碼解決這個問題嗎？

Solution

cat words.txt | tr -s ' ' '\n' | sort | uniq -c | sort -r | awk '{ print $2, $1 }'
tr -s: 通過目標字符串截取字符串，只保留一個實例（多個空格只算和一個空格等效）
sort: 排序，將相同的字符串連續在一起，以便統計相同字符串
uniq -c: 過濾重複行，並統計相同的字符串
sort -r：逆序排序
awk ‘{print $2,$1}’：格式化輸出，awk的使用見鏈接

Suggestions
這個問題有很多解決方案，建議看完awk的使用方法，再去看其他的方法，就會更容易看懂
技術這東西，大多是前人種樹，後人乘涼，我們都是站在他人肩膀上，上面的東西也是我學習別人，然後整理歸納的心得
最後再給出一個方法：
awk '{for (i=1; i <= NF; i++) a[$i]++} END {for (j in a) {print j, a[j] | "sort -r"}}' words.txt

指令助記：
tr：對應truncate的前連個字母，truncate v. 截斷
s：string的自一個字母，string：（計算機領域）字符串，其他：細繩
sort：v. 分類，排序
uniq：unique 唯一的
r : reverse 逆向，相反，reversed order 逆序

在此，向前人大神致謝！

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

192. Word Frequency

《Python進階》學習筆記

Leetcode 3161. 物塊放置查詢

一個docker容器暴露多個端口

leetcode 60 排列序列

微服務實踐之使用 Visual Studio 2022 調試Dapr 應用程序

wpf附加屬性理解 WPF附加屬性

665、Lintcode--2D不可變矩陣求和

178. Rank Scores

192. Word Frequency

Lintcode--2D不可變矩陣求和

Python2與Python3字符兼容問題的解決方案

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結