当我运行我的工作时,我看到: parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
它默认设置为5,但它是什么? 以及如何使用它来获得更好的性能?
When I run my jobs I see: parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
It is by default set to 5 but what is it? and how can I used it to get better performance?
最满意答案
是的,它默认为5。
配置参数的名称是parquet.metadata.read.parallelism 。 它仅影响有多少线程读取有关Parquet文件的元信息。
我相信它不会影响性能,因为它只涉及元数据的读取,而不是数据本身。
Yes, it defaults to 5.
The configuration parameter's name is parquet.metadata.read.parallelism. It affects only in how many threads metainformation about Parquet files is read.
I believe it does not affect performance much as it's only related to reading of metadata, not the data itself.
更多推荐
发布评论