「.NET 開発基盤部会 Wiki」は、「Open棟梁Project」,「OSSコンソーシアム .NET開発基盤部会」によって運営されています。
.NET for Apache Sparkのチュートリアル用環境で動作したので。
ローカルでもできそうだったが、手軽そうなDockerをチョイス。
docker run -it -p 8888:8888 jupyter/pyspark-notebook
[C 03:39:12.997 NotebookApp] To access the notebook, open this file in a browser: file:///home/jovyan/.local/share/jupyter/runtime/nbserver-7-open.html Or copy and paste one of these URLs: http://xxxxx:8888/?token=xxxxx or http://127.0.0.1:8888/?token=xxxxx
from pyspark.sql import SparkSession spark: SparkSession = SparkSession.builder.appName("SimpleApp").getOrCreate() # do something to prove it works spark.sql('SELECT "Test" as c1').show()
+----+ | c1| +----+ |Test| +----+
from typing import List, Tuple from pyspark.sql import SparkSession from pyspark.sql import DataFrame from pyspark.sql.types import StructField, StructType, StringType, IntegerType Trainer = Tuple[int, str, str, int] trainers: List[Trainer] = [ (1, 'サトシ', 'male', 10), (2, 'シゲル', 'male', 10), (3, 'カスミ', 'female', 12), ] trainers_schema = StructType([ StructField('id', IntegerType(), True), StructField('name', StringType(), True), StructField('gender', StringType(), True), StructField('age', IntegerType(), True), ]) trainers_df: DataFrame = spark.createDataFrame( spark.sparkContext.parallelize(trainers), trainers_schema ) trainers_df.show()
+---+------+------+---+ | id| name|gender|age| +---+------+------+---+ | 1|サトシ| male| 10| | 2|シゲル| male| 10| | 3|カスミ|female| 12| +---+------+------+---+
result = trainers_df.collect() print(result)