Programming with RDD

原創

zjfzjf2012

2020-02-26 07:02

- Passing functions to Spark (be careful the reference to the containing object which need to be serializable)

class SearchFunctions(val query: String) {
def isMatch(s: String): Boolean = {
s.contains(query)
}
def getMatchesFunctionReference(rdd: RDD[String]): RDD[String] = {
// Problem: “isMatch” means “this.isMatch”, so we pass all of “this”
rdd.map(isMatch)
}
def getMatchesFieldReference(rdd: RDD[String]): RDD[String] = {
// Problem: “query” means “this.query”, so we pass all of “this”
rdd.map(x => x.split(query))
}
def getMatchesNoReference(rdd: RDD[String]): RDD[String] = {
// Safe: extract just the field we need into a local variable
val query_ = this.query
rdd.map(x => x.split(query_))
}
}

Note that passing in local serializable
variables or functions that are members of a top-level object is always safe

- Basic RDD transformations

map, flatMap
set operations, union, distinct, intersection, subtract, cartesian. pay attention to operations needing shuffle (multiple RDDs with same type)

- Basic RDD Actions

reduce
collect
count
fold -> currying functions. provide "zero value" as first parameter which then applied as the first parameter of the function.
aggregate, initial value, function 1 to accumulate value from each node, function 2 to merge all accumulated values.
foreach , run function on distributed nodes
take, returned result not in order
RDD implicitly converted to real scala classes, like RDD[Double] to DoubleRDDFunctions

- Persist (eviction cache of partition computing result by LRU)

MEMORY_ONLY

MEMORY_ONLY_SER

MEMORY_AND_DISK

MEMORY_AND_DISK_SER

DISK_ONLY

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Programming with RDD

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

Python 安裝庫指令大全

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

2020年上半年數據庫系統工程師考試

基於 Milvus + LlamaIndex 實現高級 RAG

【2024-05-21】以茶會友

MapReduce Workflow

scala notes (3) - Files & Regular Expression, Trait, Operation and Function

MapReduce Features

scala notes (5) - pattern and case class

scala type parameters

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結