mapreduce:在大集群上處理數(shù)據(jù)[外文及翻譯].rar
mapreduce:在大集群上處理數(shù)據(jù)[外文及翻譯],mapreduce:在大集群上處理數(shù)據(jù)[外文及翻譯]包含中文翻譯和英文原文,內(nèi)容詳細完整,建議下載參考!中文: 16573 字英文: 34600字符摘要mapreduce是一種編程模型,并且是一種聯(lián)合處理和產(chǎn)生大數(shù)集的執(zhí)行過程。用戶指定一個映射(map)函數(shù),用來處理一個產(chǎn)生其他key/value媒介對的key/val...
該文檔為壓縮文件,包含的文件列表如下:


內(nèi)容介紹
原文檔由會員 xiaowei 發(fā)布
Mapreduce:在大集群上處理數(shù)據(jù)[外文及翻譯]
包含中文翻譯和英文原文,內(nèi)容詳細完整,建議下載參考!
中文: 16573 字
英文: 34600字符
摘要
MapReduce是一種編程模型,并且是一種聯(lián)合處理和產(chǎn)生大數(shù)集的執(zhí)行過程。用戶指定一個映射(map)函數(shù),用來處理一個產(chǎn)生其他key/value媒介對的key/value對;用戶指定一個化簡(reduce)函數(shù),合并所有的媒介value和key。這篇論文將表明,許多現(xiàn)實世界的任務(wù)都可以用這個模型描述。以這個函數(shù)形式寫出來的程序都是自動并行化的,并且執(zhí)行在家用計算機組成的云中。這個實時系統(tǒng)有以下功能:保存分離的數(shù)據(jù);部署程序在一組機器上執(zhí)行;處理機器錯誤;管理機器之間的通信。這允許程序員無需任何并行和分布式系統(tǒng)的經(jīng)驗,就能很容易地使用大分布系統(tǒng)的資源。我們的MapReduce程序運行在許多家用計算機組成的云上,并且高度分級化。一個典型的MapReduce計算,在數(shù)以千計的計算機上處理吉兆字節(jié)的數(shù)據(jù)。程序員會發(fā)現(xiàn)此系統(tǒng)容易使用,即數(shù)以百計的MapReduce程序被植入,每天超過一千個MapReduce被實施在Google的云上 ......
Abstract
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the pro-gram's execution across a set of machines, handling ma-chine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce pro-grams have been implemented and upwards of one thou-sand MapReduce jobs are executed on Google's clusters every day ......
包含中文翻譯和英文原文,內(nèi)容詳細完整,建議下載參考!
中文: 16573 字
英文: 34600字符
摘要
MapReduce是一種編程模型,并且是一種聯(lián)合處理和產(chǎn)生大數(shù)集的執(zhí)行過程。用戶指定一個映射(map)函數(shù),用來處理一個產(chǎn)生其他key/value媒介對的key/value對;用戶指定一個化簡(reduce)函數(shù),合并所有的媒介value和key。這篇論文將表明,許多現(xiàn)實世界的任務(wù)都可以用這個模型描述。以這個函數(shù)形式寫出來的程序都是自動并行化的,并且執(zhí)行在家用計算機組成的云中。這個實時系統(tǒng)有以下功能:保存分離的數(shù)據(jù);部署程序在一組機器上執(zhí)行;處理機器錯誤;管理機器之間的通信。這允許程序員無需任何并行和分布式系統(tǒng)的經(jīng)驗,就能很容易地使用大分布系統(tǒng)的資源。我們的MapReduce程序運行在許多家用計算機組成的云上,并且高度分級化。一個典型的MapReduce計算,在數(shù)以千計的計算機上處理吉兆字節(jié)的數(shù)據(jù)。程序員會發(fā)現(xiàn)此系統(tǒng)容易使用,即數(shù)以百計的MapReduce程序被植入,每天超過一千個MapReduce被實施在Google的云上 ......
Abstract
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the pro-gram's execution across a set of machines, handling ma-chine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce pro-grams have been implemented and upwards of one thou-sand MapReduce jobs are executed on Google's clusters every day ......