A HDFS-Based High Performance Parallel Framework for Extracting Seismic Gathers from Large Scale Seismic Datasets

  • Chao Li
  • Jiamin Wen
  • Changhai Zhao
  • Jiguo Du
  • Minqiang Shang
  • Zengbo Wang
  • Haihua Yan
  • Yida Wang

Abstract

In petroleum industry, potential oil reservoirs can be located by analyzing seismic data. Seismic data is composed of a series of fixed-size traces. In recent years, the size of seismic data can achieve hundreds of Terabytes and a seismic dataset of a large 3D survey area typically contains billions of seismic traces. A seismic gather is a collection of traces that share same attributes. During the processing of interactive analysis, seismic data is accessed by gathers. As the traces contained in a gather are usually distributed among the whole dataset, extracting the gather from seismic data will be translated to a series of random read operations from the parallel file system, which significantly reduces the efficiency. In this paper, we propose a HDFS-based framework to improve the efficiency of extracting gathers. First, the traces of a gather are divided into groups based on the principle of data locality so that each node just needs to read data from its own local disks using multiple threads. Second, dynamic load balancing mechanism is implemented for the framework based on the block duplication feature of HDFS to avoid the performance degradation caused by lagging nodes. Last, fault tolerance is supported to ensure the data integrity in case of node failures. Experimental results demonstrate that the framework promises high scalability, well load balancing ability as well as high availability and it performs more than 178 times better than the traditional approach when using 25 nodes.
Published
2017-09-30
Section
Articles