Spark SQL 打印 Plan 内容 (1) 源码层面 Unresolved Logical Plan

前面介绍了 Spark SQL 解析和优化的主要阶段, 那么现在就可以进入源码实际看看结果.

首先需要 git clone Spark 源码, 过程略.

Unresolved Logical Plan

unresolvedPlan

用 IDE 导入 Spark 工程源码, 进入 AnalysisSuite.scala 文件, 添加一个 test() :

test("foo test1") {
val unresolvedPlan = parsePlan("WITH t(x) AS (SELECT 1) SELECT * FROM t WHERE x = 1")
// scalastyle:off println
println(unresolvedPlan)
// scalastyle:on println
}

运行后, 可以看到一个 unresolvedPlan 的打印结果:

CTE [t]
: +- 'SubqueryAlias t
: +- 'UnresolvedSubqueryColumnAliases [x]
: +- 'Project [unresolvedalias(1, None)]
: +- OneRowRelation
+- 'Project [*]
+- 'Filter ('x = 1)
+- 'UnresolvedRelation [t], [], false

CTE 就是 common table expression 的简称.

toStringTree(parser)

在上面的 test() 中, 进入 parsePlan(), 加上打印 AST 的代码 :

println(parser.singleStatement().toStringTree(parser))

执行后, 可以得到一个 AST 的结果:

(singleStatement (statement (query (ctes WITH (namedQuery (errorCapturingIdentifier (identifier (strictIdentifier t)) errorCapturingIdentifierExtra) (identifierList ( (identifierSeq (errorCapturingIdentifier (identifier (strictIdentifier x)) errorCapturingIdentifierExtra)) )) AS ( (query (queryTerm (queryPrimary (querySpecification (selectClause SELECT (namedExpressionSeq (namedExpression (expression (booleanExpression (valueExpression (primaryExpression (constant (number 1)))))))))))) queryOrganization) ))) (queryTerm (queryPrimary (querySpecification (selectClause SELECT (namedExpressionSeq (namedExpression (expression (booleanExpression (valueExpression (primaryExpression *))))))) (fromClause FROM (relation (relationPrimary (multipartIdentifier (errorCapturingIdentifier (identifier (strictIdentifier t)) errorCapturingIdentifierExtra)) tableAlias))) (whereClause WHERE (booleanExpression (valueExpression (valueExpression (primaryExpression (identifier (strictIdentifier x)))) (comparisonOperator =) (valueExpression (primaryExpression (constant (number 1)))))))))) queryOrganization)) <EOF>)

这个结果不太容易看, 也可以使用 Antlr Preview 插件功能.

Antlr Preview

  • Spark 源码中找到 SQLBaseParser.g4 文件,
  • 右键点击文件中内容中的 singleStatement 弹出菜单,
  • 选择 "Test rule singleStatement" 选项, 进入插件功能界面:

  • 然后在左侧栏可以输入 SQL, 右边可以看到 Parse Tree.

除了 singleStatement 这个 rule, 里边还有 query / statement 等等 rule 同理.