-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathData Processing in Spark (1).json
1 lines (1 loc) · 124 KB
/
Data Processing in Spark (1).json
1
{"paragraphs":[{"text":"%md\nSpark DataFrame - Scala API Basics\n==================================\n\nAdvantages of using Scala\n- It's a JVM based language. So, ease of interoperability is high\n- DataFrame API is very similar to SQL. So, easy to quickly try and build pipelines even for beginners.\n- No data serialization / deserialization is required (like in the case of python). This is changing however - [link](https://twitter.com/databricks/status/927991555075575808)","user":"anonymous","dateUpdated":"2018-11-23T15:36:05+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h1>Spark DataFrame - Scala API Basics</h1>\n<p>Advantages of using Scala<br/>- It’s a JVM based language. So, ease of interoperability is high<br/>- DataFrame API is very similar to SQL. So, easy to quickly try and build pipelines even for beginners.<br/>- No data serialization / deserialization is required (like in the case of python). This is changing however - <a href=\"https://twitter.com/databricks/status/927991555075575808\">link</a></p>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081416_-731759467","id":"20170630-150113_1832147768","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:235","dateFinished":"2018-11-23T13:18:38+0000","dateStarted":"2018-11-23T13:18:35+0000"},{"text":"%md\n## 1. Read data in csv format\n\nSpark has the capability to read the data in a wide variety of formats. They include CSV, Parquet, JSON, Avro etc.\n\nPro Tip: The fastest way to read data in Spark is using Parquet format.\nAdvantages parquet format offers us:\n 1. Data can be partitioned by year, month, day etc for faster reads later.\n 2. Columnar compressed format. Expect 5-10x compression.","user":"anonymous","dateUpdated":"2018-11-23T15:36:05+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>1. Read data in csv format</h2>\n<p>Spark has the capability to read the data in a wide variety of formats. They include CSV, Parquet, JSON, Avro etc.</p>\n<p>Pro Tip: The fastest way to read data in Spark is using Parquet format.<br/>Advantages parquet format offers us:<br/> 1. Data can be partitioned by year, month, day etc for faster reads later.<br/> 2. Columnar compressed format. Expect 5-10x compression.</p>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081422_1916287725","id":"20171028-151319_319813096","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:236","dateFinished":"2018-11-23T14:27:47+0000","dateStarted":"2018-11-23T14:27:47+0000"},{"text":"val data = spark.read.option(\"header\", true).csv(\"/data/train.csv\")\n\ndata.show(5)","user":"anonymous","dateUpdated":"2018-11-23T15:36:36+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"data: org.apache.spark.sql.DataFrame = [Semana: string, Agencia_ID: string ... 9 more fields]\n+------+----------+--------+--------+----------+-----------+-------------+---------+---------------+-----------+-----------------+\n|Semana|Agencia_ID|Canal_ID|Ruta_SAK|Cliente_ID|Producto_ID|Venta_uni_hoy|Venta_hoy|Dev_uni_proxima|Dev_proxima|Demanda_uni_equil|\n+------+----------+--------+--------+----------+-----------+-------------+---------+---------------+-----------+-----------------+\n| 3| 1110| 7| 3301| 15766| 1212| 3| 25.14| 0| 0.0| 3|\n| 3| 1110| 7| 3301| 15766| 1216| 4| 33.52| 0| 0.0| 4|\n| 3| 1110| 7| 3301| 15766| 1238| 4| 39.32| 0| 0.0| 4|\n| 3| 1110| 7| 3301| 15766| 1240| 4| 33.52| 0| 0.0| 4|\n| 3| 1110| 7| 3301| 15766| 1242| 3| 22.92| 0| 0.0| 3|\n+------+----------+--------+--------+----------+-----------+-------------+---------+---------------+-----------+-----------------+\nonly showing top 5 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081422_-1505046362","id":"20171028-152138_1323127176","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:237","dateFinished":"2018-11-23T13:20:32+0000","dateStarted":"2018-11-23T13:19:36+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=0","http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=1"],"interpreterSettingId":"spark"}}},{"user":"anonymous","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"markdown","editOnDblClick":true,"completionKey":"TAB","completionSupport":false},"editorMode":"ace/mode/markdown","editorHide":true,"tableHide":false},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542979330229_880676048","id":"20181123-132210_459070326","dateCreated":"2018-11-23T13:22:10+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:8724","text":"%md\n## 2. Write data in parquet format","dateUpdated":"2018-11-23T15:36:12+0000","dateFinished":"2018-11-23T14:27:52+0000","dateStarted":"2018-11-23T14:27:52+0000","results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>2. Write data in parquet format</h2>\n</div>"}]}},{"text":"val cols = Seq(\"Week\", \"SalesDepotID\", \"SalesChannelID\", \"RouteID\", \"ClientID\", \"ProductID\", \"SalesUnitThisWeek\", \"SalesThisWeek\", \"ReturnsUnitThisWeek\",\n \"ReturnsNextWeek\", \"Demand\")\n\nval data = spark.read.option(\"header\", true).csv(\"/data/train.csv\").toDF(cols :_*)\n \ndata.write.mode(SaveMode.Overwrite).parquet(\"/data/train.parquet\")","user":"anonymous","dateUpdated":"2018-11-23T15:36:05+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":false},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"cols: Seq[String] = List(Week, SalesDepotID, SalesChannelID, RouteID, ClientID, ProductID, SalesUnitThisWeek, SalesThisWeek, ReturnsUnitThisWeek, ReturnsNextWeek, Demand)\ndata: org.apache.spark.sql.DataFrame = [Week: string, SalesDepotID: string ... 9 more fields]\n"}]},"apps":[],"jobName":"paragraph_1542979081422_-1389147277","id":"20171109-043855_204032803","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:238","dateFinished":"2018-11-23T13:55:06+0000","dateStarted":"2018-11-23T13:49:38+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=8","http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=9"],"interpreterSettingId":"spark"}}},{"user":"anonymous","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"markdown","editOnDblClick":true,"completionKey":"TAB","completionSupport":false},"editorMode":"ace/mode/markdown","editorHide":true,"tableHide":false},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542979545118_-1447199358","id":"20181123-132545_393473157","dateCreated":"2018-11-23T13:25:45+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:8972","text":"%md\n## 3. Read data in parquet format","dateUpdated":"2018-11-23T15:36:12+0000","dateFinished":"2018-11-23T14:27:57+0000","dateStarted":"2018-11-23T14:27:57+0000","results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>3. Read data in parquet format</h2>\n</div>"}]}},{"text":"val data = spark.read.parquet(\"/data/train.parquet\")\n","user":"anonymous","dateUpdated":"2018-11-23T15:36:43+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"data: org.apache.spark.sql.DataFrame = [Week: string, SalesDepotID: string ... 9 more fields]\n"}]},"apps":[],"jobName":"paragraph_1542979081423_153615540","id":"20171111-071958_409605100","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:240","dateFinished":"2018-11-23T13:57:00+0000","dateStarted":"2018-11-23T13:56:52+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=11"],"interpreterSettingId":"spark"}}},{"text":"%md\n## 4. Inspect the data","user":"anonymous","dateUpdated":"2018-11-23T15:36:06+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>4. Inspect the data</h2>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081424_1892806281","id":"20171028-152704_1574835336","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:241","dateFinished":"2018-11-23T14:28:01+0000","dateStarted":"2018-11-23T14:28:01+0000"},{"text":"data","user":"anonymous","dateUpdated":"2018-11-23T15:36:45+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"res54: org.apache.spark.sql.DataFrame = [Week: string, SalesDepotID: string ... 9 more fields]\n"}]},"apps":[],"jobName":"paragraph_1542979081424_-1865633769","id":"20171028-154040_29398163","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:242","dateFinished":"2018-11-23T14:25:06+0000","dateStarted":"2018-11-23T14:25:05+0000"},{"text":"%md\nThere's no output in the above statement. This is where you have to know difference between transformations and actions. Spark doesn't start \ncomputation until an action is specified. Actions can be {show, collect, take, head, limit}.","user":"anonymous","dateUpdated":"2018-11-23T15:36:06+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{"0":{"graph":{"mode":"table","height":300,"optionOpen":false}}},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<p>There’s no output in the above statement. This is where you have to know difference between transformations and actions. Spark doesn’t start<br/>computation until an action is specified. Actions can be {show, collect, take, head, limit}.</p>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081425_1575981455","id":"20170629-123336_1779344926","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:243","dateFinished":"2018-11-23T13:57:13+0000","dateStarted":"2018-11-23T13:57:13+0000"},{"text":"data.show(3)","user":"anonymous","dateUpdated":"2018-11-23T15:36:49+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+------+\n|Week|SalesDepotID|SalesChannelID|RouteID|ClientID|ProductID|SalesUnitThisWeek|SalesThisWeek|ReturnsUnitThisWeek|ReturnsNextWeek|Demand|\n+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+------+\n| 9| 1461| 1| 1201| 831447| 34213| 1| 19.94| 0| 0.0| 1|\n| 9| 1461| 1| 1201| 831447| 34255| 2| 32.0| 0| 0.0| 2|\n| 9| 1461| 1| 1201| 831447| 35571| 1| 21.39| 0| 0.0| 1|\n+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+------+\nonly showing top 3 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081425_1461019521","id":"20171028-154013_2045848151","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:244","dateFinished":"2018-11-23T14:25:19+0000","dateStarted":"2018-11-23T14:25:11+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=29"],"interpreterSettingId":"spark"}}},{"user":"anonymous","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"markdown","editOnDblClick":true,"completionKey":"TAB","completionSupport":false},"editorMode":"ace/mode/markdown","editorHide":true,"tableHide":false},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542983115174_642303186","id":"20181123-142515_762458300","dateCreated":"2018-11-23T14:25:15+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:10402","text":"%md\n## 5. Viewing schema of a df","dateUpdated":"2018-11-23T15:36:12+0000","dateFinished":"2018-11-23T14:28:07+0000","dateStarted":"2018-11-23T14:28:07+0000","results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>5. Viewing schema of a df</h2>\n</div>"}]}},{"text":"data.printSchema","user":"anonymous","dateUpdated":"2018-11-23T15:36:51+0000","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"editorMode":"ace/mode/scala","tableHide":true},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542983137828_-1412265944","id":"20181123-142537_1624360134","dateCreated":"2018-11-23T14:25:37+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:10556","dateFinished":"2018-11-23T14:25:46+0000","dateStarted":"2018-11-23T14:25:46+0000","results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"root\n |-- Week: string (nullable = true)\n |-- SalesDepotID: string (nullable = true)\n |-- SalesChannelID: string (nullable = true)\n |-- RouteID: string (nullable = true)\n |-- ClientID: string (nullable = true)\n |-- ProductID: string (nullable = true)\n |-- SalesUnitThisWeek: string (nullable = true)\n |-- SalesThisWeek: string (nullable = true)\n |-- ReturnsUnitThisWeek: string (nullable = true)\n |-- ReturnsNextWeek: string (nullable = true)\n |-- Demand: string (nullable = true)\n\n"}]}},{"user":"anonymous","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"markdown","editOnDblClick":true,"completionKey":"TAB","completionSupport":false},"editorMode":"ace/mode/markdown","editorHide":true,"tableHide":false},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542984143880_-2075707625","id":"20181123-144223_1800615658","dateCreated":"2018-11-23T14:42:23+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:12807","text":"%md\n## 6. Count number of rows in a df","dateUpdated":"2018-11-23T15:36:13+0000","dateFinished":"2018-11-23T14:43:13+0000","dateStarted":"2018-11-23T14:43:13+0000","results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>6. Count number of rows in a df</h2>\n</div>"}]}},{"text":"data.count","user":"anonymous","dateUpdated":"2018-11-23T15:36:54+0000","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"editorMode":"ace/mode/scala","tableHide":true},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542984196843_-678713183","id":"20181123-144316_781541039","dateCreated":"2018-11-23T14:43:16+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:12909","dateFinished":"2018-11-23T14:43:32+0000","dateStarted":"2018-11-23T14:43:21+0000","results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"res102: Long = 74180464\n"}]},"runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=47"],"interpreterSettingId":"spark"}}},{"user":"anonymous","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"markdown","editOnDblClick":true,"completionKey":"TAB","completionSupport":false},"editorMode":"ace/mode/markdown","editorHide":true,"tableHide":false},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542984206685_-319271613","id":"20181123-144326_752294919","dateCreated":"2018-11-23T14:43:26+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:13014","text":"%md\n## 7. Count number of columns in a df\n\nTry it yourself","dateUpdated":"2018-11-23T15:36:13+0000","dateFinished":"2018-11-23T14:44:53+0000","dateStarted":"2018-11-23T14:44:53+0000","results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>7. Count number of columns in a df</h2>\n<p>Try it yourself</p>\n</div>"}]}},{"text":"%md \n## 8. Selecting one column from a df","user":"anonymous","dateUpdated":"2018-11-23T15:36:06+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>8. Selecting one column from a df</h2>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081426_1970645964","id":"20170630-181042_315842834","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:245","dateFinished":"2018-11-23T14:45:00+0000","dateStarted":"2018-11-23T14:45:00+0000"},{"text":"data.\nselect($\"ClientID\").\nshow(7)","user":"anonymous","dateUpdated":"2018-11-23T15:36:58+0000","config":{"tableHide":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"colWidth":12,"editorMode":"ace/mode/scala","editorHide":false,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+--------+\n|ClientID|\n+--------+\n| 831447|\n| 831447|\n| 831447|\n| 831447|\n| 831447|\n| 831447|\n| 831447|\n+--------+\nonly showing top 7 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081426_1042513320","id":"20170629-123424_706086736","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:246","dateFinished":"2018-11-23T13:57:44+0000","dateStarted":"2018-11-23T13:57:43+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=13"],"interpreterSettingId":"spark"}}},{"text":"%md\n## 9. Select multiple columns from a df","user":"anonymous","dateUpdated":"2018-11-23T15:37:05+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>9. Select multiple columns from a df</h2>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081427_529123551","id":"20170629-123449_1255780859","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:248","dateFinished":"2018-11-23T15:37:05+0000","dateStarted":"2018-11-23T15:37:05+0000"},{"text":"data.\nselect($\"ClientID\", $\"Demand\").\nshow(7)","user":"anonymous","dateUpdated":"2018-11-23T15:37:09+0000","config":{"tableHide":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"colWidth":12,"editorMode":"ace/mode/scala","editorHide":false,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+--------+------+\n|ClientID|Demand|\n+--------+------+\n| 831447| 1|\n| 831447| 2|\n| 831447| 1|\n| 831447| 13|\n| 831447| 4|\n| 831447| 6|\n| 831447| 35|\n+--------+------+\nonly showing top 7 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081428_1299895696","id":"20170629-123514_1017896837","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:249","dateFinished":"2018-11-23T13:58:26+0000","dateStarted":"2018-11-23T13:58:26+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=14"],"interpreterSettingId":"spark"}}},{"text":"%md\n## 10. Rename a column in a df","user":"anonymous","dateUpdated":"2018-11-23T15:37:12+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>10. Rename a column in a df</h2>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081429_-1424867830","id":"20171028-154603_254656965","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:250","dateFinished":"2018-11-23T15:37:12+0000","dateStarted":"2018-11-23T15:37:12+0000"},{"text":"data.\nselect($\"Demand\" as \"DMD\").\nshow(7)","user":"anonymous","dateUpdated":"2018-11-23T15:37:19+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+---+\n|DMD|\n+---+\n| 1|\n| 2|\n| 1|\n| 13|\n| 4|\n| 6|\n| 35|\n+---+\nonly showing top 7 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081429_-193440043","id":"20170702-120605_1824452888","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:251","dateFinished":"2018-11-23T13:58:39+0000","dateStarted":"2018-11-23T13:58:38+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=15"],"interpreterSettingId":"spark"}}},{"user":"anonymous","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"markdown","editOnDblClick":true,"completionKey":"TAB","completionSupport":false},"editorMode":"ace/mode/markdown","editorHide":true,"tableHide":false},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542983209000_1965796046","id":"20181123-142649_727660624","dateCreated":"2018-11-23T14:26:49+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:10821","text":"%md\n## 11. Rename multiple columns in a df","dateUpdated":"2018-11-23T15:37:21+0000","dateFinished":"2018-11-23T15:37:21+0000","dateStarted":"2018-11-23T15:37:21+0000","results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>11. Rename multiple columns in a df</h2>\n</div>"}]}},{"text":"val rename_cols = Seq(\"C_ID\", \"DMD\")\n\ndata.\nselect($\"ClientID\", $\"Demand\").\ntoDF(rename_cols:_*).\nshow(7)","user":"anonymous","dateUpdated":"2018-11-23T15:37:25+0000","config":{"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"colWidth":12,"editorMode":"ace/mode/scala","editorHide":false,"results":{},"enabled":true,"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"rename_cols: Seq[String] = List(C_ID, DMD)\n+------+---+\n| C_ID|DMD|\n+------+---+\n|831447| 1|\n|831447| 2|\n|831447| 1|\n|831447| 13|\n|831447| 4|\n|831447| 6|\n|831447| 35|\n+------+---+\nonly showing top 7 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081430_-685788841","id":"20170630-193439_1420474628","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:253","dateFinished":"2018-11-23T14:26:46+0000","dateStarted":"2018-11-23T14:26:37+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=31"],"interpreterSettingId":"spark"}}},{"user":"anonymous","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"markdown","editOnDblClick":true,"completionKey":"TAB","completionSupport":false},"editorMode":"ace/mode/markdown","editorHide":true,"tableHide":false},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542983310094_1440995671","id":"20181123-142830_1836584099","dateCreated":"2018-11-23T14:28:30+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:11095","text":"%md\n## 12. Add a column to a df","dateUpdated":"2018-11-23T15:37:27+0000","dateFinished":"2018-11-23T15:37:27+0000","dateStarted":"2018-11-23T15:37:27+0000","results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>12. Add a column to a df</h2>\n</div>"}]}},{"text":"data.\nselect($\"Week\", $\"SalesDepotID\").\nwithColumn(\"new\", lit(1)).\nshow(9)\n","user":"anonymous","dateUpdated":"2018-11-23T15:37:30+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+----+------------+---+\n|Week|SalesDepotID|new|\n+----+------------+---+\n| 9| 1461| 1|\n| 9| 1461| 1|\n| 9| 1461| 1|\n| 9| 1461| 1|\n| 9| 1461| 1|\n| 9| 1461| 1|\n| 9| 1461| 1|\n| 9| 1461| 1|\n| 9| 1461| 1|\n+----+------------+---+\nonly showing top 9 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081432_1361686675","id":"20171028-160443_1582927527","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:256","dateFinished":"2018-11-23T14:00:20+0000","dateStarted":"2018-11-23T14:00:20+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=24"],"interpreterSettingId":"spark"}}},{"user":"anonymous","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"markdown","editOnDblClick":true,"completionKey":"TAB","completionSupport":false},"editorMode":"ace/mode/markdown","editorHide":true,"tableHide":false},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542983331628_-1106488351","id":"20181123-142851_519879030","dateCreated":"2018-11-23T14:28:51+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:11189","text":"%md\n## 13. Add multiple columns to a df","dateUpdated":"2018-11-23T15:37:33+0000","dateFinished":"2018-11-23T15:37:33+0000","dateStarted":"2018-11-23T15:37:33+0000","results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>13. Add multiple columns to a df</h2>\n</div>"}]}},{"text":"data.\nselect($\"Week\", $\"SalesDepotID\").\nwithColumn(\"new\", lit(1)).\nwithColumn(\"today\", current_date).\nshow(9)","user":"anonymous","dateUpdated":"2018-11-23T15:37:36+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+----+------------+---+----------+\n|Week|SalesDepotID|new| today|\n+----+------------+---+----------+\n| 9| 1461| 1|2018-11-23|\n| 9| 1461| 1|2018-11-23|\n| 9| 1461| 1|2018-11-23|\n| 9| 1461| 1|2018-11-23|\n| 9| 1461| 1|2018-11-23|\n| 9| 1461| 1|2018-11-23|\n| 9| 1461| 1|2018-11-23|\n| 9| 1461| 1|2018-11-23|\n| 9| 1461| 1|2018-11-23|\n+----+------------+---+----------+\nonly showing top 9 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081433_-1079554945","id":"20171028-161236_367739926","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:258","dateFinished":"2018-11-23T14:00:43+0000","dateStarted":"2018-11-23T14:00:43+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=25"],"interpreterSettingId":"spark"}}},{"text":"%md\n## 14. Drop a column from a df","user":"anonymous","dateUpdated":"2018-11-23T15:37:37+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>14. Drop a column from a df</h2>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081434_-1510465724","id":"20171028-161329_1884848401","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:260","dateFinished":"2018-11-23T15:37:37+0000","dateStarted":"2018-11-23T15:37:37+0000"},{"text":"data.\ndrop(\"Demand\").\nshow(9)","user":"anonymous","dateUpdated":"2018-11-23T15:36:07+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":false},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+\n|Week|SalesDepotID|SalesChannelID|RouteID|ClientID|ProductID|SalesUnitThisWeek|SalesThisWeek|ReturnsUnitThisWeek|ReturnsNextWeek|\n+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+\n| 9| 1461| 1| 1201| 831447| 34213| 1| 19.94| 0| 0.0|\n| 9| 1461| 1| 1201| 831447| 34255| 2| 32.0| 0| 0.0|\n| 9| 1461| 1| 1201| 831447| 35571| 1| 21.39| 0| 0.0|\n| 9| 1461| 1| 1201| 831447| 36711| 13| 97.5| 0| 0.0|\n| 9| 1461| 1| 1201| 831447| 43118| 4| 39.64| 0| 0.0|\n| 9| 1461| 1| 1201| 831447| 43197| 6| 50.28| 0| 0.0|\n| 9| 1461| 1| 1201| 831447| 43206| 35| 157.5| 0| 0.0|\n| 9| 1461| 1| 1201| 831447| 43207| 12| 36.24| 0| 0.0|\n| 9| 1461| 1| 1201| 831571| 43118| 2| 19.82| 0| 0.0|\n+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+\nonly showing top 9 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081434_-271155638","id":"20171028-161339_1462506866","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:261","dateFinished":"2018-11-23T14:29:45+0000","dateStarted":"2018-11-23T14:29:37+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=32"],"interpreterSettingId":"spark"}}},{"text":"%md\n## 15. Drop multiple columns from a df","user":"anonymous","dateUpdated":"2018-11-23T15:37:44+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>15. Drop multiple columns from a df</h2>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081435_1883719610","id":"20170630-180346_1102345919","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:262","dateFinished":"2018-11-23T15:37:44+0000","dateStarted":"2018-11-23T15:37:44+0000"},{"text":"data.\ndrop(\"Week\", \"ClientID\").\nshow(9)","user":"anonymous","dateUpdated":"2018-11-23T15:37:47+0000","config":{"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"colWidth":12,"editorMode":"ace/mode/scala","editorHide":false,"results":{},"enabled":true,"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+------------+--------------+-------+---------+-----------------+-------------+-------------------+---------------+------+\n|SalesDepotID|SalesChannelID|RouteID|ProductID|SalesUnitThisWeek|SalesThisWeek|ReturnsUnitThisWeek|ReturnsNextWeek|Demand|\n+------------+--------------+-------+---------+-----------------+-------------+-------------------+---------------+------+\n| 1461| 1| 1201| 34213| 1| 19.94| 0| 0.0| 1|\n| 1461| 1| 1201| 34255| 2| 32.0| 0| 0.0| 2|\n| 1461| 1| 1201| 35571| 1| 21.39| 0| 0.0| 1|\n| 1461| 1| 1201| 36711| 13| 97.5| 0| 0.0| 13|\n| 1461| 1| 1201| 43118| 4| 39.64| 0| 0.0| 4|\n| 1461| 1| 1201| 43197| 6| 50.28| 0| 0.0| 6|\n| 1461| 1| 1201| 43206| 35| 157.5| 0| 0.0| 35|\n| 1461| 1| 1201| 43207| 12| 36.24| 0| 0.0| 12|\n| 1461| 1| 1201| 43118| 2| 19.82| 0| 0.0| 2|\n+------------+--------------+-------+---------+-----------------+-------------+-------------------+---------------+------+\nonly showing top 9 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081436_-131627837","id":"20170630-180406_886238385","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:264","dateFinished":"2018-11-23T14:02:05+0000","dateStarted":"2018-11-23T14:02:05+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=27"],"interpreterSettingId":"spark"}}},{"text":"%md\n## 16. Convert a column of type T1 to type T2\n\nType-casting columns are useful if the types are not appropriately defined when data is stored. Also, the arithmetic operations are slightly faster on int/float columns rather than implicitly type-casted columns.","user":"anonymous","dateUpdated":"2018-11-23T15:37:50+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false,"completionKey":"TAB"},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>16. Convert a column of type T1 to type T2</h2>\n<p>Type-casting columns are useful if the types are not appropriately defined when data is stored. Also, the arithmetic operations are slightly faster on int/float columns rather than implicitly type-casted columns.</p>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081437_2040793816","id":"20171028-161458_529413323","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:265","dateFinished":"2018-11-23T15:37:50+0000","dateStarted":"2018-11-23T15:37:50+0000"},{"text":"data.\nselect($\"Demand\".cast(\"float\") as \"Demand_float\").\nshow(8)","user":"anonymous","dateUpdated":"2018-11-23T15:37:55+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+------------+\n|Demand_float|\n+------------+\n| 1.0|\n| 2.0|\n| 1.0|\n| 13.0|\n| 4.0|\n| 6.0|\n| 35.0|\n| 12.0|\n+------------+\nonly showing top 8 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081438_-671662384","id":"20171109-045504_489871850","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:267","dateFinished":"2018-11-23T14:31:30+0000","dateStarted":"2018-11-23T14:31:22+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=33"],"interpreterSettingId":"spark"}}},{"text":"%md\n## 17. Filter column based on comparison operator","user":"anonymous","dateUpdated":"2018-11-23T15:37:58+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>17. Filter column based on comparison operator</h2>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081438_1909031678","id":"20171028-161703_721391652","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:268","dateFinished":"2018-11-23T15:37:58+0000","dateStarted":"2018-11-23T15:37:58+0000"},{"text":"data.\nselect($\"Week\", $\"Demand\", $\"ClientID\").\nwhere($\"Demand\" > 60).\nshow(9)","user":"anonymous","dateUpdated":"2018-11-23T15:38:07+0000","config":{"tableHide":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"colWidth":12,"editorMode":"ace/mode/scala","editorHide":false,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+----+------+--------+\n|Week|Demand|ClientID|\n+----+------+--------+\n| 9| 96| 4673275|\n| 9| 80| 689399|\n| 9| 70| 1434758|\n| 9| 109| 1643359|\n| 9| 102| 2242477|\n| 9| 80| 2242477|\n| 9| 90| 2242477|\n| 9| 66| 2387733|\n| 9| 145| 2387733|\n+----+------+--------+\nonly showing top 9 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081439_321846871","id":"20170629-123558_2037765849","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:269","dateFinished":"2018-11-23T14:32:07+0000","dateStarted":"2018-11-23T14:32:06+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=37"],"interpreterSettingId":"spark"}}},{"user":"anonymous","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"markdown","editOnDblClick":true,"completionKey":"TAB","completionSupport":false},"editorMode":"ace/mode/markdown","editorHide":true,"tableHide":false},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542983550064_-828893129","id":"20181123-143230_1058895418","dateCreated":"2018-11-23T14:32:30+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:11839","text":"%md\n## 18. Filter based on multiple conditions","dateUpdated":"2018-11-23T15:38:09+0000","dateFinished":"2018-11-23T15:38:09+0000","dateStarted":"2018-11-23T15:38:09+0000","results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>18. Filter based on multiple conditions</h2>\n</div>"}]}},{"text":"data.\nfilter($\"Demand\" > 60 && $\"Demand\" < 100).\nselect($\"ClientID\", $\"Demand\", $\"ProductID\").\nshow(9)","user":"anonymous","dateUpdated":"2018-11-23T15:38:12+0000","config":{"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"colWidth":12,"editorMode":"ace/mode/scala","editorHide":false,"results":{},"enabled":true,"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+--------+------+---------+\n|ClientID|Demand|ProductID|\n+--------+------+---------+\n| 4673275| 96| 43206|\n| 689399| 80| 43206|\n| 1434758| 70| 43206|\n| 2242477| 80| 35571|\n| 2242477| 90| 43206|\n| 2387733| 66| 35571|\n| 144655| 77| 34213|\n| 144655| 72| 43206|\n| 150807| 72| 43206|\n+--------+------+---------+\nonly showing top 9 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081440_-1808162555","id":"20170630-184820_1293140504","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:271","dateFinished":"2018-11-23T14:33:24+0000","dateStarted":"2018-11-23T14:33:14+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=39"],"interpreterSettingId":"spark"}}},{"text":"%md\n\n## 19. Filtering on string columns \nTry it yourself\n\n## 20. Filtering based on regex and substrings\nTry it yourself\n\nHint: Try using *\"===\", .contains, .like* operators","user":"anonymous","dateUpdated":"2018-11-23T15:36:08+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>19. Filtering on string columns</h2>\n<p>Try it yourself</p>\n<h2>20. Filtering based on regex and substrings</h2>\n<p>Try it yourself</p>\n<p>Hint: Try using <em>“===”, .contains, .like</em> operators</p>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081440_-355846682","id":"20170629-123823_485303408","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:272","dateFinished":"2018-11-23T14:45:44+0000","dateStarted":"2018-11-23T14:45:44+0000"},{"text":"%md\n## 21. Filter rows on exact match from a list","user":"anonymous","dateUpdated":"2018-11-23T15:38:18+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>21. Filter rows on exact match from a list</h2>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081441_1234533198","id":"20170629-130522_905310665","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:273","dateFinished":"2018-11-23T15:38:18+0000","dateStarted":"2018-11-23T15:38:18+0000"},{"text":"val week_values = Seq(3, 5, 6)\n\ndata.\nselect(\"ClientID\", \"Week\", \"SalesThisWeek\", \"Demand\").\nwhere($\"Week\".isin(week_values:_*)).\nshow(9)","user":"anonymous","dateUpdated":"2018-11-23T15:38:20+0000","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"editorMode":"ace/mode/scala","tableHide":true},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542983717023_-250486329","id":"20181123-143517_194531236","dateCreated":"2018-11-23T14:35:17+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:12056","dateFinished":"2018-11-23T14:35:55+0000","dateStarted":"2018-11-23T14:35:47+0000","results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"week_values: Seq[Int] = List(3, 5, 6)\n+--------+----+-------------+------+\n|ClientID|Week|SalesThisWeek|Demand|\n+--------+----+-------------+------+\n| 179031| 6| 8.43| 1|\n| 179031| 6| 30.0| 5|\n| 179031| 6| 80.82| 9|\n| 179031| 6| 113.31| 9|\n| 179036| 6| 27.0| 3|\n| 179036| 6| 18.5| 5|\n| 179036| 6| 12.0| 2|\n| 179036| 6| 20.28| 2|\n| 179036| 6| 26.94| 3|\n+--------+----+-------------+------+\nonly showing top 9 rows\n\n"}]},"runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=40"],"interpreterSettingId":"spark"}}},{"text":"data.\nselect(\"ClientID\", \"Week\", \"SalesThisWeek\", \"Demand\").\nwhere($\"Week\".isin(3, 4, 5, 7, 8)).\nshow(9)","user":"anonymous","dateUpdated":"2018-11-23T15:38:23+0000","config":{"tableHide":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"colWidth":12,"editorMode":"ace/mode/scala","editorHide":false,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+--------+----+-------------+------+\n|ClientID|Week|SalesThisWeek|Demand|\n+--------+----+-------------+------+\n| 1169124| 7| 67.5| 15|\n| 1216337| 7| 21.39| 0|\n| 1216337| 7| 16.76| 2|\n| 1216337| 7| 38.2| 5|\n| 1216337| 7| 22.92| 3|\n| 1216337| 7| 30.2| 10|\n| 1216337| 7| 193.5| 43|\n| 1222108| 7| 42.78| 2|\n| 1222108| 7| 8.38| 0|\n+--------+----+-------------+------+\nonly showing top 9 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081442_1443994506","id":"20170629-130833_780017670","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:274","dateFinished":"2018-11-23T14:36:12+0000","dateStarted":"2018-11-23T14:36:10+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=42"],"interpreterSettingId":"spark"}}},{"text":"%md\n## 22. Filter rows excluding values from a list","user":"anonymous","dateUpdated":"2018-11-23T15:38:26+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>22. Filter rows excluding values from a list</h2>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081442_2009121612","id":"20170629-130913_1487177821","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:275","dateFinished":"2018-11-23T15:38:26+0000","dateStarted":"2018-11-23T15:38:26+0000"},{"text":"data.\nfilter(!($\"Week\".isin(2, 3))).\nselect(\"ClientID\", \"Week\", \"SalesThisWeek\", \"Demand\").\nshow(7)","user":"anonymous","dateUpdated":"2018-11-23T15:38:29+0000","config":{"tableHide":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"colWidth":12,"editorMode":"ace/mode/scala","editorHide":false,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+--------+----+-------------+------+\n|ClientID|Week|SalesThisWeek|Demand|\n+--------+----+-------------+------+\n| 831447| 9| 19.94| 1|\n| 831447| 9| 32.0| 2|\n| 831447| 9| 21.39| 1|\n| 831447| 9| 97.5| 13|\n| 831447| 9| 39.64| 4|\n| 831447| 9| 50.28| 6|\n| 831447| 9| 157.5| 35|\n+--------+----+-------------+------+\nonly showing top 7 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081443_143991546","id":"20170629-130932_1358198527","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:276","dateFinished":"2018-11-23T14:36:37+0000","dateStarted":"2018-11-23T14:36:37+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=43"],"interpreterSettingId":"spark"}}},{"user":"anonymous","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"markdown","editOnDblClick":true,"completionKey":"TAB","completionSupport":false},"editorMode":"ace/mode/markdown","editorHide":true,"tableHide":false},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542983807783_-2012836703","id":"20181123-143647_517958583","dateCreated":"2018-11-23T14:36:47+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:12379","text":"%md\n## 23. Drop duplicate rows","dateUpdated":"2018-11-23T15:36:13+0000","dateFinished":"2018-11-23T14:45:59+0000","dateStarted":"2018-11-23T14:45:59+0000","results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>23. Drop duplicate rows</h2>\n</div>"}]}},{"text":"data.\ndropDuplicates.\nshow(4)","user":"anonymous","dateUpdated":"2018-11-23T15:38:32+0000","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"editorMode":"ace/mode/scala","tableHide":true},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542983836800_-718319579","id":"20181123-143716_471918846","dateCreated":"2018-11-23T14:37:16+0000","status":"ABORT","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:12469","dateFinished":"2018-11-23T14:39:38+0000","dateStarted":"2018-11-23T14:37:49+0000","results":{"code":"ERROR","msg":[{"type":"TEXT","data":"org.apache.spark.SparkException: Job 44 cancelled part of cancelled job group zeppelin-2DWQU9D7S-20181123-143716_471918846\n at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1803)\n at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1738)\n at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply$mcVI$sp(DAGScheduler.scala:851)\n at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply(DAGScheduler.scala:851)\n at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply(DAGScheduler.scala:851)\n at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)\n at org.apache.spark.scheduler.DAGScheduler.handleJobGroupCancelled(DAGScheduler.scala:851)\n at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1993)\n at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1973)\n at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1962)\n at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)\n at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:682)\n at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)\n at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)\n at org.apache.spark.SparkContext.runJob(SparkContext.scala:2074)\n at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:363)\n at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)\n at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3278)\n at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489)\n at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489)\n at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3259)\n at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)\n at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258)\n at org.apache.spark.sql.Dataset.head(Dataset.scala:2489)\n at org.apache.spark.sql.Dataset.take(Dataset.scala:2703)\n at org.apache.spark.sql.Dataset.showString(Dataset.scala:254)\n at org.apache.spark.sql.Dataset.show(Dataset.scala:723)\n at org.apache.spark.sql.Dataset.show(Dataset.scala:682)\n ... 52 elided\n"}]},"runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=44"],"interpreterSettingId":"spark"}}},{"user":"anonymous","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"markdown","editOnDblClick":true,"completionKey":"TAB","completionSupport":false},"editorMode":"ace/mode/markdown","editorHide":true,"tableHide":false},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542983883335_-441603102","id":"20181123-143803_960018526","dateCreated":"2018-11-23T14:38:03+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:12590","text":"%md\n## 24. Drop duplicate rows based on multiple columns\n","dateUpdated":"2018-11-23T15:36:13+0000","dateFinished":"2018-11-23T14:46:04+0000","dateStarted":"2018-11-23T14:46:04+0000","results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>24. Drop duplicate rows based on multiple columns</h2>\n</div>"}]}},{"text":"data.\ndropDuplicates(\"Week\", \"Demand\").\nshow(6)","user":"anonymous","dateUpdated":"2018-11-23T15:38:36+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+------+\n|Week|SalesDepotID|SalesChannelID|RouteID|ClientID|ProductID|SalesUnitThisWeek|SalesThisWeek|ReturnsUnitThisWeek|ReturnsNextWeek|Demand|\n+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+------+\n| 3| 24669| 1| 1051| 166757| 34264| 310| 5731.8| 0| 0.0| 310|\n| 3| 2246| 2| 1509| 1002810| 34867| 315| 11837.7| 0| 0.0| 315|\n| 3| 3214| 1| 2140| 571353| 30572| 42| 262.5| 0| 0.0| 42|\n| 3| 24669| 1| 1152| 185193| 43206| 49| 220.5| 0| 0.0| 49|\n| 3| 4079| 5| 3007| 653378| 43202| 502| 5577.22| 0| 0.0| 502|\n| 3| 2244| 2| 1529| 54857| 34210| 539| 12240.69| 0| 0.0| 539|\n+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+------+\nonly showing top 6 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081444_-1444819115","id":"20171028-164221_152374871","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:278","dateFinished":"2018-11-23T14:41:34+0000","dateStarted":"2018-11-23T14:40:11+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=45"],"interpreterSettingId":"spark"}}},{"text":"%md\n## 25. Replace a string with a certain value in a column in a df\nTry it yourself\n\n## 26. Replace a regex pattern with a certain value in a column in a df\nTry it yourself\n\n## 27. Extract substrings from a column in a df\nTry it yourself\n\n## 28. Concatenate strings from multiple columns in a df\nTry it yourself","user":"anonymous","dateUpdated":"2018-11-23T15:36:09+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>25. Replace a string with a certain value in a column in a df</h2>\n<p>Try it yourself</p>\n<h2>26. Replace a regex pattern with a certain value in a column in a df</h2>\n<p>Try it yourself</p>\n<h2>27. Extract substrings from a column in a df</h2>\n<p>Try it yourself</p>\n<h2>28. Concatenate strings from multiple columns in a df</h2>\n<p>Try it yourself</p>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081445_1125868066","id":"20171028-164657_1557594498","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:281","dateFinished":"2018-11-23T14:46:30+0000","dateStarted":"2018-11-23T14:46:30+0000"},{"text":"%md\n## 29. Sort rows based on single column","user":"anonymous","dateUpdated":"2018-11-23T15:36:09+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>29. Sort rows based on single column</h2>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081446_-516959788","id":"20170629-150912_1864982686","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:283","dateFinished":"2018-11-23T14:46:38+0000","dateStarted":"2018-11-23T14:46:38+0000"},{"text":"data.\nselect($\"Week\", $\"Demand\").\nsort(desc(\"Demand\")).\nshow(9)","user":"anonymous","dateUpdated":"2018-11-23T15:38:45+0000","config":{"tableHide":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"colWidth":12,"editorMode":"ace/mode/scala","editorHide":false,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+----+------+\n|Week|Demand|\n+----+------+\n| 5| 999|\n| 8| 999|\n| 5| 998|\n| 7| 998|\n| 4| 998|\n| 3| 998|\n| 3| 998|\n| 7| 998|\n| 6| 998|\n+----+------+\nonly showing top 9 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081446_-2101606185","id":"20170629-150104_26782218","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:284","dateFinished":"2018-11-23T14:47:07+0000","dateStarted":"2018-11-23T14:46:49+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=49"],"interpreterSettingId":"spark"}}},{"user":"anonymous","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"markdown","editOnDblClick":true,"completionKey":"TAB","completionSupport":false},"editorMode":"ace/mode/markdown","editorHide":true,"tableHide":false},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542984415124_-500572491","id":"20181123-144655_822510200","dateCreated":"2018-11-23T14:46:55+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:13512","text":"%md\n## 30. Sort rows based on multiple columns","dateUpdated":"2018-11-23T15:36:13+0000","dateFinished":"2018-11-23T14:47:57+0000","dateStarted":"2018-11-23T14:47:57+0000","results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>30. Sort rows based on multiple columns</h2>\n</div>"}]}},{"text":"data.\nsort(desc(\"Demand\"), asc(\"Week\")).\nselect($\"Week\", $\"Demand\").\nshow(9)","user":"anonymous","dateUpdated":"2018-11-23T15:38:50+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+----+------+\n|Week|Demand|\n+----+------+\n| 5| 999|\n| 8| 999|\n| 3| 998|\n| 3| 998|\n| 4| 998|\n| 5| 998|\n| 6| 998|\n| 6| 998|\n| 7| 998|\n+----+------+\nonly showing top 9 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081447_1367056131","id":"20171028-165711_1776558992","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:286","dateFinished":"2018-11-23T14:48:29+0000","dateStarted":"2018-11-23T14:48:15+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=50"],"interpreterSettingId":"spark"}}},{"text":"%md\n## 31. Aggregate data on a column","user":"anonymous","dateUpdated":"2018-11-23T15:36:09+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>31. Aggregate data on a column</h2>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081448_2106408783","id":"20170630-154651_592712664","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:287","dateFinished":"2018-11-23T14:48:44+0000","dateStarted":"2018-11-23T14:48:44+0000"},{"text":"data.\nagg(\n mean($\"Demand\")\n).\nshow","user":"anonymous","dateUpdated":"2018-11-23T15:38:52+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+------------------+\n| avg(Demand)|\n+------------------+\n|7.2245640038056385|\n+------------------+\n\n"}]},"apps":[],"jobName":"paragraph_1542979081448_1874361261","id":"20171028-170820_667440827","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:288","dateFinished":"2018-11-23T14:48:54+0000","dateStarted":"2018-11-23T14:48:51+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=53"],"interpreterSettingId":"spark"}}},{"text":"%md\n## 32. Aggregate data using multiple columns","user":"anonymous","dateUpdated":"2018-11-23T15:36:09+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>32. Aggregate data using multiple columns</h2>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081449_817968730","id":"20171028-170909_2024229991","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:289","dateFinished":"2018-11-23T14:49:11+0000","dateStarted":"2018-11-23T14:49:11+0000"},{"text":"data.\ngroupBy($\"Week\", $\"ProductID\").\nagg(\n count($\"*\") as \"count\",\n mean($\"Demand\") as \"mean\"\n).\nshow(9)","user":"anonymous","dateUpdated":"2018-11-23T15:38:55+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+----+---------+------+------------------+\n|Week|ProductID| count| mean|\n+----+---------+------+------------------+\n| 9| 32303| 19760| 5.330971659919029|\n| 9| 31519| 29|11.586206896551724|\n| 9| 134| 43| 35.0|\n| 9| 37400| 86|3.2674418604651163|\n| 9| 46138| 5| 202.8|\n| 9| 32886| 13|19.692307692307693|\n| 6| 35525| 670| 35.04626865671642|\n| 6| 31506| 13102| 8.493588765074035|\n| 6| 1238|166625| 3.13377344336084|\n+----+---------+------+------------------+\nonly showing top 9 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081450_-6098006","id":"20171028-171005_823475246","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:290","dateFinished":"2018-11-23T14:49:42+0000","dateStarted":"2018-11-23T14:49:34+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=54"],"interpreterSettingId":"spark"}}},{"user":"anonymous","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"markdown","editOnDblClick":true,"completionKey":"TAB","completionSupport":false},"editorMode":"ace/mode/markdown","editorHide":true,"tableHide":false},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542986715225_-579725380","id":"20181123-152515_523198589","dateCreated":"2018-11-23T15:25:15+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:16151","text":"%md\n## 33. Aggregations conditioned on other columns","dateUpdated":"2018-11-23T15:36:14+0000","dateFinished":"2018-11-23T15:25:27+0000","dateStarted":"2018-11-23T15:25:27+0000","results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>33. Aggregations conditioned on other columns</h2>\n</div>"}]}},{"text":"data.\ngroupBy($\"Week\", $\"ProductID\").\nagg(\n count($\"ProductID\") as \"count\",\n mean(when($\"Demand\" > 1, $\"Demand\") as \"mean\")\n).\nshow(9)","user":"anonymous","dateUpdated":"2018-11-23T15:38:58+0000","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"editorMode":"ace/mode/scala","tableHide":true},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542986730389_1952150897","id":"20181123-152530_615503400","dateCreated":"2018-11-23T15:25:30+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:16241","dateFinished":"2018-11-23T15:27:21+0000","dateStarted":"2018-11-23T15:27:13+0000","results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+----+---------+------+-----------------------------------------------------+\n|Week|ProductID| count|avg(CASE WHEN (Demand > 1) THEN Demand END AS `mean`)|\n+----+---------+------+-----------------------------------------------------+\n| 9| 32303| 19760| 6.308256315465188|\n| 9| 31519| 29| 11.586206896551724|\n| 9| 134| 43| 35.0|\n| 9| 37400| 86| 5.511111111111111|\n| 9| 46138| 5| 202.8|\n| 9| 32886| 13| 21.333333333333332|\n| 6| 35525| 670| 35.51285930408472|\n| 6| 31506| 13102| 9.134365427601622|\n| 6| 1238|166625| 4.198560191974403|\n+----+---------+------+-----------------------------------------------------+\nonly showing top 9 rows\n\n"}]},"runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=87"],"interpreterSettingId":"spark"}}},{"user":"anonymous","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"markdown","editOnDblClick":true,"completionKey":"TAB","completionSupport":false},"editorMode":"ace/mode/markdown","editorHide":true,"tableHide":false},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542986871422_-404989961","id":"20181123-152751_1065853306","dateCreated":"2018-11-23T15:27:51+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:16657","text":"%md\n## 34. Percentile function\nTry it yourself","dateUpdated":"2018-11-23T15:36:14+0000","dateFinished":"2018-11-23T15:28:15+0000","dateStarted":"2018-11-23T15:28:15+0000","results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>34. Percentile function</h2>\n<p>Try it yourself</p>\n</div>"}]}},{"text":"%md \n## 35. Pivot a df","user":"anonymous","dateUpdated":"2018-11-23T15:36:09+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>35. Pivot a df</h2>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081450_-1098671898","id":"20171028-170928_307988706","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:291","dateFinished":"2018-11-23T15:28:21+0000","dateStarted":"2018-11-23T15:28:21+0000"},{"text":"data.\ngroupBy(\"ClientID\").\npivot(\"Week\").\nagg(\n sum(\"Demand\")\n).\nshow(7)","user":"anonymous","dateUpdated":"2018-11-23T15:39:01+0000","config":{"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"colWidth":12,"editorMode":"ace/mode/scala","editorHide":false,"results":{},"enabled":true,"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+--------+-----+-----+-----+-----+-----+-----+-----+\n|ClientID| 3| 4| 5| 6| 7| 8| 9|\n+--------+-----+-----+-----+-----+-----+-----+-----+\n| 1529827| 52.0| 91.0|110.0| 66.0|112.0| 82.0| 71.0|\n| 1558679|206.0|445.0|440.0|436.0|327.0|286.0|354.0|\n| 588231| 7.0| 9.0| 10.0| 11.0| 10.0| 19.0| 14.0|\n| 579782| 35.0| 31.0| 32.0| 11.0| 28.0| 77.0| 40.0|\n| 383353| 62.0| 75.0| 39.0| 23.0| 20.0| 58.0| 51.0|\n| 2170405| 25.0| null| 3.0| 17.0| 22.0|127.0| 9.0|\n| 665280| 10.0| 22.0| 21.0| 92.0| 42.0| 44.0| 14.0|\n+--------+-----+-----+-----+-----+-----+-----+-----+\nonly showing top 7 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081451_-1439430170","id":"20170630-160120_1109927041","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:294","dateFinished":"2018-11-23T14:51:56+0000","dateStarted":"2018-11-23T14:51:17+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=55","http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=56"],"interpreterSettingId":"spark"}}},{"text":"%md\n## 36. Melt a df\nTry it yourself","user":"anonymous","dateUpdated":"2018-11-23T15:36:09+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>36. Melt a df</h2>\n<p>Try it yourself</p>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081452_-358415742","id":"20171109-052259_746103219","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:295","dateFinished":"2018-11-23T15:31:29+0000","dateStarted":"2018-11-23T15:31:29+0000"},{"text":"%md\n## 37. Windowed aggregations\n","user":"anonymous","dateUpdated":"2018-11-23T15:36:14+0000","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"markdown","editOnDblClick":true,"completionKey":"TAB","completionSupport":false},"editorMode":"ace/mode/markdown","editorHide":true,"tableHide":false},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542986919598_1258358760","id":"20181123-152839_1218595521","dateCreated":"2018-11-23T15:28:39+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:16813","dateFinished":"2018-11-23T15:29:21+0000","dateStarted":"2018-11-23T15:29:21+0000","results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>37. Windowed aggregations</h2>\n</div>"}]}},{"text":"import org.apache.spark.sql.functions.{lead, lag}\nimport org.apache.spark.sql.expressions.Window\n","user":"anonymous","dateUpdated":"2018-11-23T15:39:04+0000","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"editorMode":"ace/mode/scala","tableHide":true},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542987086523_-1326671517","id":"20181123-153126_838472494","dateCreated":"2018-11-23T15:31:26+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:17221","dateFinished":"2018-11-23T15:31:36+0000","dateStarted":"2018-11-23T15:31:35+0000","results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"import org.apache.spark.sql.functions.{lead, lag}\nimport org.apache.spark.sql.expressions.Window\n"}]}},{"user":"anonymous","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"editorMode":"ace/mode/scala","tableHide":true},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542987099153_984080255","id":"20181123-153139_246207511","dateCreated":"2018-11-23T15:31:39+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:17335","text":"val df = Seq((1, \"a\", 3), (2, \"a\", 10), (3, \"b\", 7), (4, \"b\", 4), (5, \"c\", 3)).\n toDF(\"day\", \"item\", \"sales\")\n \ndf.show","dateUpdated":"2018-11-23T15:39:06+0000","dateFinished":"2018-11-23T15:32:33+0000","dateStarted":"2018-11-23T15:32:32+0000","results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"df: org.apache.spark.sql.DataFrame = [day: int, item: string ... 1 more field]\n+---+----+-----+\n|day|item|sales|\n+---+----+-----+\n| 1| a| 3|\n| 2| a| 10|\n| 3| b| 7|\n| 4| b| 4|\n| 5| c| 3|\n+---+----+-----+\n\n"}]}},{"user":"anonymous","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"editorMode":"ace/mode/scala","tableHide":true},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542987164584_383840250","id":"20181123-153244_1582763917","dateCreated":"2018-11-23T15:32:44+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:17466","text":"df.\nwithColumn(\"previos_day_sales\", lag($\"sales\", 1).over(Window.partitionBy($\"item\").orderBy(\"day\"))).\nna.fill(0).\nshow","dateUpdated":"2018-11-23T15:39:08+0000","dateFinished":"2018-11-23T15:33:17+0000","dateStarted":"2018-11-23T15:33:14+0000","results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+---+----+-----+------------+\n|day|item|sales|sales_lag_1d|\n+---+----+-----+------------+\n| 5| c| 3| 0|\n| 3| b| 7| 0|\n| 4| b| 4| 7|\n| 1| a| 3| 0|\n| 2| a| 10| 3|\n+---+----+-----+------------+\n\n"}]},"runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=88","http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=89","http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=90","http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=91","http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=92"],"interpreterSettingId":"spark"}}},{"user":"anonymous","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"markdown","editOnDblClick":true,"completionKey":"TAB","completionSupport":false},"editorMode":"ace/mode/markdown","editorHide":true,"tableHide":false},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542987262456_851630883","id":"20181123-153422_625810637","dateCreated":"2018-11-23T15:34:22+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:17589","text":"%md\nThis is very useful for creating lag features. Try to create a feature which is the Demand of a product in previous week.","dateUpdated":"2018-11-23T15:36:14+0000","dateFinished":"2018-11-23T15:35:09+0000","dateStarted":"2018-11-23T15:35:09+0000","results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<p>This is very useful for creating lag features. Try to create a feature which is the Demand of a product in previous week.</p>\n</div>"}]}},{"text":"%md\n## 38. Figure out missing values from a column in a df","user":"anonymous","dateUpdated":"2018-11-23T15:36:10+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>38. Figure out missing values from a column in a df</h2>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081452_695507918","id":"20170630-171510_105164651","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:296","dateFinished":"2018-11-23T15:29:27+0000","dateStarted":"2018-11-23T15:29:27+0000"},{"text":"data.\nwhere($\"Demand\".isNull).\nshow","user":"anonymous","dateUpdated":"2018-11-23T15:39:12+0000","config":{"tableHide":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"colWidth":12,"editorMode":"ace/mode/scala","editorHide":false,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+------+\n|Week|SalesDepotID|SalesChannelID|RouteID|ClientID|ProductID|SalesUnitThisWeek|SalesThisWeek|ReturnsUnitThisWeek|ReturnsNextWeek|Demand|\n+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+------+\n+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+------+\n\n"}]},"apps":[],"jobName":"paragraph_1542979081453_268816355","id":"20171103-053620_1422671964","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:297","dateFinished":"2018-11-23T14:52:49+0000","dateStarted":"2018-11-23T14:52:24+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=57","http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=58","http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=59","http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=60"],"interpreterSettingId":"spark"}}},{"text":"%md\n## 39. Fill a column with missing values with a constant","user":"anonymous","dateUpdated":"2018-11-23T15:36:10+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>39. Fill a column with missing values with a constant</h2>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081453_1568373693","id":"20171103-053618_215998971","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:298","dateFinished":"2018-11-23T15:29:33+0000","dateStarted":"2018-11-23T15:29:33+0000"},{"text":"data.\nna.fill(\"-1\", Seq(\"Demand\")).\nna.fill(-999, Seq(\"ClientID\")).\nshow(9)","user":"anonymous","dateUpdated":"2018-11-23T15:39:14+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+------+\n|Week|SalesDepotID|SalesChannelID|RouteID|ClientID|ProductID|SalesUnitThisWeek|SalesThisWeek|ReturnsUnitThisWeek|ReturnsNextWeek|Demand|\n+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+------+\n| 3| 1110| 7| 3301| 15766| 1212| 3| 25.14| 0| 0.0| 3|\n| 3| 1110| 7| 3301| 15766| 1216| 4| 33.52| 0| 0.0| 4|\n| 3| 1110| 7| 3301| 15766| 1238| 4| 39.32| 0| 0.0| 4|\n| 3| 1110| 7| 3301| 15766| 1240| 4| 33.52| 0| 0.0| 4|\n| 3| 1110| 7| 3301| 15766| 1242| 3| 22.92| 0| 0.0| 3|\n| 3| 1110| 7| 3301| 15766| 1250| 5| 38.2| 0| 0.0| 5|\n| 3| 1110| 7| 3301| 15766| 1309| 3| 20.28| 0| 0.0| 3|\n| 3| 1110| 7| 3301| 15766| 3894| 6| 56.1| 0| 0.0| 6|\n| 3| 1110| 7| 3301| 15766| 4085| 4| 24.6| 0| 0.0| 4|\n+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+------+\nonly showing top 9 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081454_924469954","id":"20171109-052806_2102846818","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:299","dateFinished":"2018-11-23T14:53:15+0000","dateStarted":"2018-11-23T14:53:14+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=61"],"interpreterSettingId":"spark"}}},{"text":"%md\n## 40. Fill all the missing values in a df with a constant","user":"anonymous","dateUpdated":"2018-11-23T15:36:10+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>40. Fill all the missing values in a df with a constant</h2>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081454_-1664232153","id":"20171028-171214_1484459708","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:300","dateFinished":"2018-11-23T15:29:39+0000","dateStarted":"2018-11-23T15:29:39+0000"},{"text":"data.na.fill(-3.14).\nshow(9)","user":"anonymous","dateUpdated":"2018-11-23T15:39:16+0000","config":{"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"colWidth":12,"editorMode":"ace/mode/scala","editorHide":false,"results":{},"enabled":true,"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+------+\n|Week|SalesDepotID|SalesChannelID|RouteID|ClientID|ProductID|SalesUnitThisWeek|SalesThisWeek|ReturnsUnitThisWeek|ReturnsNextWeek|Demand|\n+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+------+\n| 3| 1110| 7| 3301| 15766| 1212| 3| 25.14| 0| 0.0| 3|\n| 3| 1110| 7| 3301| 15766| 1216| 4| 33.52| 0| 0.0| 4|\n| 3| 1110| 7| 3301| 15766| 1238| 4| 39.32| 0| 0.0| 4|\n| 3| 1110| 7| 3301| 15766| 1240| 4| 33.52| 0| 0.0| 4|\n| 3| 1110| 7| 3301| 15766| 1242| 3| 22.92| 0| 0.0| 3|\n| 3| 1110| 7| 3301| 15766| 1250| 5| 38.2| 0| 0.0| 5|\n| 3| 1110| 7| 3301| 15766| 1309| 3| 20.28| 0| 0.0| 3|\n| 3| 1110| 7| 3301| 15766| 3894| 6| 56.1| 0| 0.0| 6|\n| 3| 1110| 7| 3301| 15766| 4085| 4| 24.6| 0| 0.0| 4|\n+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+------+\nonly showing top 9 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081455_1962242621","id":"20170630-185002_1157831963","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:301","dateFinished":"2018-11-23T14:53:34+0000","dateStarted":"2018-11-23T14:53:33+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=62"],"interpreterSettingId":"spark"}}},{"text":"%md\n## 41. Adding a day to a date column","user":"anonymous","dateUpdated":"2018-11-23T15:36:10+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>41. Adding a day to a date column</h2>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081456_-339227104","id":"20171109-053004_1995299031","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:302","dateFinished":"2018-11-23T15:29:46+0000","dateStarted":"2018-11-23T15:29:46+0000"},{"text":"val df2 = data.\n withColumn(\"current_date\", current_date).\n select($\"current_date\")\n \ndf2.\nwithColumn(\"next_date\", date_add($\"current_date\", 1)).\nshow(5)\n","user":"anonymous","dateUpdated":"2018-11-23T15:39:18+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"df2: org.apache.spark.sql.DataFrame = [current_date: date]\n+------------+----------+\n|current_date| next_date|\n+------------+----------+\n| 2018-11-23|2018-11-24|\n| 2018-11-23|2018-11-24|\n| 2018-11-23|2018-11-24|\n| 2018-11-23|2018-11-24|\n| 2018-11-23|2018-11-24|\n+------------+----------+\nonly showing top 5 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081456_-1725497629","id":"20171109-053024_2044828408","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:303","dateFinished":"2018-11-23T14:55:39+0000","dateStarted":"2018-11-23T14:55:32+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=63"],"interpreterSettingId":"spark"}}},{"user":"anonymous","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"markdown","editOnDblClick":true,"completionKey":"TAB","completionSupport":false},"editorMode":"ace/mode/markdown","editorHide":true,"tableHide":false},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542984944418_1698659475","id":"20181123-145544_1931318490","dateCreated":"2018-11-23T14:55:44+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:14336","text":"%md\n## 42. Finding difference of two date columns","dateUpdated":"2018-11-23T15:36:13+0000","dateFinished":"2018-11-23T15:29:52+0000","dateStarted":"2018-11-23T15:29:52+0000","results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>42. Finding difference of two date columns</h2>\n</div>"}]}},{"text":"val df2 = data.\n withColumn(\"current_date\", current_date).\n withColumn(\"next_date\", date_add($\"current_date\", 3)).\n select($\"current_date\", $\"next_date\")\n \ndf2.\nwithColumn(\"difference\", datediff($\"next_date\", $\"current_date\")).\nshow","user":"anonymous","dateUpdated":"2018-11-23T15:39:21+0000","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"editorMode":"ace/mode/scala","tableHide":true},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542985042314_979673139","id":"20181123-145722_1239957581","dateCreated":"2018-11-23T14:57:22+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:14426","dateFinished":"2018-11-23T15:02:45+0000","dateStarted":"2018-11-23T15:02:45+0000","results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"df2: org.apache.spark.sql.DataFrame = [current_date: date, next_date: date]\n+------------+----------+----------+\n|current_date| next_date|difference|\n+------------+----------+----------+\n| 2018-11-23|2018-11-26| 3|\n| 2018-11-23|2018-11-26| 3|\n| 2018-11-23|2018-11-26| 3|\n| 2018-11-23|2018-11-26| 3|\n| 2018-11-23|2018-11-26| 3|\n| 2018-11-23|2018-11-26| 3|\n| 2018-11-23|2018-11-26| 3|\n| 2018-11-23|2018-11-26| 3|\n| 2018-11-23|2018-11-26| 3|\n| 2018-11-23|2018-11-26| 3|\n| 2018-11-23|2018-11-26| 3|\n| 2018-11-23|2018-11-26| 3|\n| 2018-11-23|2018-11-26| 3|\n| 2018-11-23|2018-11-26| 3|\n| 2018-11-23|2018-11-26| 3|\n| 2018-11-23|2018-11-26| 3|\n| 2018-11-23|2018-11-26| 3|\n| 2018-11-23|2018-11-26| 3|\n| 2018-11-23|2018-11-26| 3|\n| 2018-11-23|2018-11-26| 3|\n+------------+----------+----------+\nonly showing top 20 rows\n\n"}]},"runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=67"],"interpreterSettingId":"spark"}}},{"text":"%md\n## 43. Extract day, month, year from a date column","user":"anonymous","dateUpdated":"2018-11-23T15:36:10+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>43. Extract day, month, year from a date column</h2>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081457_-685715996","id":"20171109-053051_851716521","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:304","dateFinished":"2018-11-23T15:29:58+0000","dateStarted":"2018-11-23T15:29:58+0000"},{"text":"val df2 = data.\n select(current_date as \"current_date\")\n \ndf2.\nwithColumn(\"day\", dayofmonth($\"current_date\")).\nwithColumn(\"month\", month($\"current_date\")).\nwithColumn(\"year\", year($\"current_date\")).\nshow(9)","user":"anonymous","dateUpdated":"2018-11-23T15:39:27+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"df2: org.apache.spark.sql.DataFrame = [current_date: date]\n+------------+---+-----+----+\n|current_date|day|month|year|\n+------------+---+-----+----+\n| 2018-11-23| 23| 11|2018|\n| 2018-11-23| 23| 11|2018|\n| 2018-11-23| 23| 11|2018|\n| 2018-11-23| 23| 11|2018|\n| 2018-11-23| 23| 11|2018|\n| 2018-11-23| 23| 11|2018|\n| 2018-11-23| 23| 11|2018|\n| 2018-11-23| 23| 11|2018|\n| 2018-11-23| 23| 11|2018|\n+------------+---+-----+----+\nonly showing top 9 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081457_444276105","id":"20171109-053310_459178003","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:305","dateFinished":"2018-11-23T15:01:30+0000","dateStarted":"2018-11-23T15:01:22+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=64"],"interpreterSettingId":"spark"}}},{"text":"%md\n\n## 44. Joining two dfs - left, right, outer, inner","user":"anonymous","dateUpdated":"2018-11-23T15:36:10+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>44. Joining two dfs - left, right, outer, inner</h2>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081458_2080188726","id":"20171109-053404_1893342184","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:306","dateFinished":"2018-11-23T15:30:03+0000","dateStarted":"2018-11-23T15:30:03+0000"},{"text":"val data1 = data.limit(3).select(\"Week\", \"Demand\")\nval data2 = data.limit(5).select(\"RouteID\", \"Demand\")\n\ndata1.\njoin(data2, Seq(\"Demand\"), \"inner\").\nshow","user":"anonymous","dateUpdated":"2018-11-23T15:39:30+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"data1: org.apache.spark.sql.DataFrame = [Week: string, Demand: string]\ndata2: org.apache.spark.sql.DataFrame = [RouteID: string, Demand: string]\n+------+----+-------+\n|Demand|Week|RouteID|\n+------+----+-------+\n| 1| 9| 1201|\n| 1| 9| 1201|\n| 2| 9| 1201|\n| 1| 9| 1201|\n| 1| 9| 1201|\n+------+----+-------+\n\n"}]},"apps":[],"jobName":"paragraph_1542979081458_-1632262348","id":"20171109-053445_153935159","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:307","dateFinished":"2018-11-23T15:03:45+0000","dateStarted":"2018-11-23T15:03:43+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=76","http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=77"],"interpreterSettingId":"spark"}}},{"text":"data1.show","user":"anonymous","dateUpdated":"2018-11-23T15:39:32+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+----+------+\n|Week|Demand|\n+----+------+\n| 9| 1|\n| 9| 5|\n| 9| 4|\n+----+------+\n\n"}]},"apps":[],"jobName":"paragraph_1542979081459_976097345","id":"20171109-053552_1228941048","dateCreated":"2018-11-23T13:18:01+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:308"},{"text":"data2.show","user":"anonymous","dateUpdated":"2018-11-23T15:39:33+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+-------+------+\n|RouteID|Demand|\n+-------+------+\n| 2113| 1|\n| 2113| 5|\n| 2113| 4|\n| 2113| 8|\n| 2113| 4|\n+-------+------+\n\n"}]},"apps":[],"jobName":"paragraph_1542979081459_218774998","id":"20171109-053618_389310824","dateCreated":"2018-11-23T13:18:01+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:309"},{"text":"%md\n## 45. Appending dataframes using union","user":"anonymous","dateUpdated":"2018-11-23T15:36:11+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>45. Appending dataframes using union</h2>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081460_552710437","id":"20171109-053621_1228959835","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:310","dateFinished":"2018-11-23T15:30:10+0000","dateStarted":"2018-11-23T15:30:10+0000"},{"text":"data1.\nunion(data2).\nshow","user":"anonymous","dateUpdated":"2018-11-23T15:39:37+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+----+------+\n|Week|Demand|\n+----+------+\n| 9| 1|\n| 9| 2|\n| 9| 1|\n|1201| 1|\n|1201| 2|\n|1201| 1|\n|1201| 13|\n|1201| 4|\n+----+------+\n\n"}]},"apps":[],"jobName":"paragraph_1542979081460_-346019227","id":"20171109-053800_261857139","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:311","dateFinished":"2018-11-23T15:04:09+0000","dateStarted":"2018-11-23T15:04:08+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=78","http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=79"],"interpreterSettingId":"spark"}}},{"text":"%md This is obviously buggy! Try to fix this.","user":"anonymous","dateUpdated":"2018-11-23T15:36:11+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<p>This is obviously buggy! Try to fix this.</p>\n"}]},"apps":[],"jobName":"paragraph_1542979081461_-2051551348","id":"20171109-053812_12479645","dateCreated":"2018-11-23T13:18:01+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:312"},{"text":"%md\nOften, we have to create intermediate results that we tend to join with other dataframes.\nExample: creation of count features, user features, aggregates for machine learning tasks. We join these to the main training set.\nIn such a case, creation of each set of features runs the whole pipeline of reading the dataset etc. Instead, we can cache the intermediate\ndataset so that when we join these datasets, the whole DAG isn't triggered instead the computation starts from the point where we cached the data.\n\n## 46. Cache - memory\n\n## 47. Persist - memory + disk\n\nTry it yourself\n","user":"anonymous","dateUpdated":"2018-11-23T15:36:11+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<p>Often, we have to create intermediate results that we tend to join with other dataframes.<br/>Example: creation of count features, user features, aggregates for machine learning tasks. We join these to the main training set.<br/>In such a case, creation of each set of features runs the whole pipeline of reading the dataset etc. Instead, we can cache the intermediate<br/>dataset so that when we join these datasets, the whole DAG isn’t triggered instead the computation starts from the point where we cached the data.</p>\n<h2>46. Cache - memory</h2>\n<h2>47. Persist - memory + disk</h2>\n<p>Try it yourself</p>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081462_-805975676","id":"20171109-054021_1058799091","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:313","dateFinished":"2018-11-23T15:30:23+0000","dateStarted":"2018-11-23T15:30:23+0000"},{"text":"val data1 = data.limit(100000)","user":"anonymous","dateUpdated":"2018-11-23T15:39:48+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"tmp1: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [Week: string, SalesDepotID: string ... 9 more fields]\n"}]},"apps":[],"jobName":"paragraph_1542979081462_251139232","id":"20171109-054343_461890621","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:314","dateFinished":"2018-11-23T15:06:34+0000","dateStarted":"2018-11-23T15:06:33+0000"},{"text":"data1.\ngroupBy(\"ClientID\", \"RouteID\", \"Week\").\nagg(expr(\"PERCENTILE(DEMAND, 0.5)\") as \"median\").\nshow(9)","user":"anonymous","dateUpdated":"2018-11-23T15:39:52+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+--------+-------+----+------+\n|ClientID|RouteID|Week|median|\n+--------+-------+----+------+\n| 100021| 1202| 9| 2.0|\n| 100021| 2804| 9| 3.0|\n| 100025| 1014| 9| 3.5|\n| 100025| 1114| 9| 1.5|\n| 100025| 2006| 9| 5.0|\n| 1002689| 1043| 9| 4.0|\n| 1002689| 1143| 9| 3.0|\n| 1002689| 2115| 9| 3.0|\n| 1002771| 1255| 9| 4.5|\n+--------+-------+----+------+\nonly showing top 9 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081463_-1218412804","id":"20171109-054043_1412301147","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:315","dateFinished":"2018-11-23T15:07:00+0000","dateStarted":"2018-11-23T15:06:43+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=80"],"interpreterSettingId":"spark"}}},{"text":"data1.cache()","user":"anonymous","dateUpdated":"2018-11-23T15:40:01+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"res174: tmp1.type = [Week: string, SalesDepotID: string ... 9 more fields]\n"}]},"apps":[],"jobName":"paragraph_1542979081464_592086790","id":"20171109-054124_1909147814","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:316","dateFinished":"2018-11-23T15:11:14+0000","dateStarted":"2018-11-23T15:11:13+0000"},{"text":"data1.show","user":"anonymous","dateUpdated":"2018-11-23T15:40:11+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+------+\n|Week|SalesDepotID|SalesChannelID|RouteID|ClientID|ProductID|SalesUnitThisWeek|SalesThisWeek|ReturnsUnitThisWeek|ReturnsNextWeek|Demand|\n+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+------+\n| 8| 1620| 1| 2014| 4432918| 43065| 2| 13.52| 0| 0.0| 2|\n| 8| 1620| 1| 2014| 4432918| 43069| 3| 22.23| 0| 0.0| 3|\n| 8| 1620| 1| 2014| 4432918| 43084| 1| 8.15| 0| 0.0| 1|\n| 8| 1620| 1| 2014| 4432918| 43274| 11| 57.09| 0| 0.0| 11|\n| 8| 1620| 1| 2014| 4462821| 43068| 10| 38.4| 0| 0.0| 10|\n| 8| 1620| 1| 2014| 4546104| 35453| 32| 161.6| 0| 0.0| 32|\n| 8| 1620| 1| 2014| 4546104| 35456| 12| 49.8| 0| 0.0| 12|\n| 8| 1620| 1| 2014| 4546104| 37058| 10| 75.8| 0| 0.0| 10|\n| 8| 1620| 1| 2014| 4546104| 43069| 1| 8.42| 0| 0.0| 1|\n| 8| 1620| 1| 2014| 4546104| 43307| 42| 211.68| 0| 0.0| 42|\n| 8| 1620| 1| 2014| 4547142| 35453| 7| 31.08| 0| 0.0| 7|\n| 8| 1620| 1| 2014| 4547142| 36610| 40| 30.8| 0| 0.0| 40|\n| 8| 1620| 1| 2014| 4547142| 37058| 4| 30.0| 0| 0.0| 4|\n| 8| 1620| 1| 2014| 4547142| 43064| 1| 8.15| 0| 0.0| 1|\n| 8| 1620| 1| 2014| 4547142| 43068| 10| 38.4| 0| 0.0| 10|\n| 8| 1620| 1| 2014| 4547142| 43069| 6| 44.46| 0| 0.0| 6|\n| 8| 1620| 1| 2014| 4547142| 43084| 1| 8.15| 0| 0.0| 1|\n| 8| 1620| 1| 2014| 4547142| 43215| 4| 40.56| 0| 0.0| 4|\n| 8| 1620| 1| 2014| 4547142| 43274| 1| 5.19| 0| 0.0| 1|\n| 8| 1620| 1| 2014| 4547142| 43307| 15| 79.2| 0| 0.0| 15|\n+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+------+\nonly showing top 20 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081464_-2093545796","id":"20171109-054303_954150575","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:317","dateFinished":"2018-11-23T15:07:32+0000","dateStarted":"2018-11-23T15:07:29+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=81"],"interpreterSettingId":"spark"}}},{"text":"data1.show","user":"anonymous","dateUpdated":"2018-11-23T15:40:19+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":true},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+------+\n|Week|SalesDepotID|SalesChannelID|RouteID|ClientID|ProductID|SalesUnitThisWeek|SalesThisWeek|ReturnsUnitThisWeek|ReturnsNextWeek|Demand|\n+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+------+\n| 8| 1620| 1| 2014| 4432918| 43065| 2| 13.52| 0| 0.0| 2|\n| 8| 1620| 1| 2014| 4432918| 43069| 3| 22.23| 0| 0.0| 3|\n| 8| 1620| 1| 2014| 4432918| 43084| 1| 8.15| 0| 0.0| 1|\n| 8| 1620| 1| 2014| 4432918| 43274| 11| 57.09| 0| 0.0| 11|\n| 8| 1620| 1| 2014| 4462821| 43068| 10| 38.4| 0| 0.0| 10|\n| 8| 1620| 1| 2014| 4546104| 35453| 32| 161.6| 0| 0.0| 32|\n| 8| 1620| 1| 2014| 4546104| 35456| 12| 49.8| 0| 0.0| 12|\n| 8| 1620| 1| 2014| 4546104| 37058| 10| 75.8| 0| 0.0| 10|\n| 8| 1620| 1| 2014| 4546104| 43069| 1| 8.42| 0| 0.0| 1|\n| 8| 1620| 1| 2014| 4546104| 43307| 42| 211.68| 0| 0.0| 42|\n| 8| 1620| 1| 2014| 4547142| 35453| 7| 31.08| 0| 0.0| 7|\n| 8| 1620| 1| 2014| 4547142| 36610| 40| 30.8| 0| 0.0| 40|\n| 8| 1620| 1| 2014| 4547142| 37058| 4| 30.0| 0| 0.0| 4|\n| 8| 1620| 1| 2014| 4547142| 43064| 1| 8.15| 0| 0.0| 1|\n| 8| 1620| 1| 2014| 4547142| 43068| 10| 38.4| 0| 0.0| 10|\n| 8| 1620| 1| 2014| 4547142| 43069| 6| 44.46| 0| 0.0| 6|\n| 8| 1620| 1| 2014| 4547142| 43084| 1| 8.15| 0| 0.0| 1|\n| 8| 1620| 1| 2014| 4547142| 43215| 4| 40.56| 0| 0.0| 4|\n| 8| 1620| 1| 2014| 4547142| 43274| 1| 5.19| 0| 0.0| 1|\n| 8| 1620| 1| 2014| 4547142| 43307| 15| 79.2| 0| 0.0| 15|\n+----+------------+--------------+-------+--------+---------+-----------------+-------------+-------------------+---------------+------+\nonly showing top 20 rows\n\n"}]},"apps":[],"jobName":"paragraph_1542979081465_1224538630","id":"20171109-054313_41402531","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:318","dateFinished":"2018-11-23T15:07:40+0000","dateStarted":"2018-11-23T15:07:40+0000","runtimeInfos":{"jobUrl":{"propertyName":"jobUrl","label":"SPARK JOB","tooltip":"View in Spark web UI","group":"spark","values":["http://ip-172-31-50-132.ec2.internal:4040/jobs/job?id=82"],"interpreterSettingId":"spark"}}},{"user":"anonymous","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"markdown","editOnDblClick":true,"completionKey":"TAB","completionSupport":false},"editorMode":"ace/mode/markdown","editorHide":true,"tableHide":false},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542985673149_107161183","id":"20181123-150753_213048994","dateCreated":"2018-11-23T15:07:53+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:15448","text":"%md\nDid you find a difference in the time taken to show the results when you ran it the first time vs the second time?","dateUpdated":"2018-11-23T15:36:14+0000","dateFinished":"2018-11-23T15:08:23+0000","dateStarted":"2018-11-23T15:08:23+0000","results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<p>Did you find a difference in the time taken to show the results when you ran it the first time vs the second time?</p>\n</div>"}]}},{"text":"%md\n\n## 48. Checkpointing\nIn case the job fails during the run, spark takes care of restarting the job. Incase we cache / persist any data, the lineage / DAG \n(methodology in which we computed the cached DF) is saved and the job can recompute the cached DF.\n\nHowever, if you'd like to throw the DAG away, use checkpointing. Checkpointing only stores the DF onto the disk and forgets how its computed.","user":"anonymous","dateUpdated":"2018-11-23T15:36:11+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>48. Checkpointing</h2>\n<p>In case the job fails during the run, spark takes care of restarting the job. Incase we cache / persist any data, the lineage / DAG<br/>(methodology in which we computed the cached DF) is saved and the job can recompute the cached DF.</p>\n<p>However, if you’d like to throw the DAG away, use checkpointing. Checkpointing only stores the DF onto the disk and forgets how its computed.</p>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081466_-1637791919","id":"20171109-055134_49521634","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:321","dateFinished":"2018-11-23T15:30:33+0000","dateStarted":"2018-11-23T15:30:33+0000","focus":true},{"text":"%md\n\n## 49. Repartition\nRepartitioning shuffles the whole data and creates equal sized partitions of the data.\nRead about it yourself on when to use it.","user":"anonymous","dateUpdated":"2018-11-23T15:36:12+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>49. Repartition</h2>\n<p>Repartitioning shuffles the whole data and creates equal sized partitions of the data.<br/>Read about it yourself on when to use it.</p>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081467_-929982674","id":"20171109-055916_2123599818","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:323","dateFinished":"2018-11-23T15:30:39+0000","dateStarted":"2018-11-23T15:30:39+0000"},{"text":"data.repartition(100)","user":"anonymous","dateUpdated":"2018-11-23T15:40:23+0000","config":{"colWidth":12,"fontSize":9,"enabled":true,"results":{},"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"editorMode":"ace/mode/scala","tableHide":true},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542986051529_1279536731","id":"20181123-151411_1057302283","dateCreated":"2018-11-23T15:14:11+0000","status":"FINISHED","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:15760","dateFinished":"2018-11-23T15:14:27+0000","dateStarted":"2018-11-23T15:14:26+0000","results":{"code":"SUCCESS","msg":[{"type":"TEXT","data":"res178: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [Week: string, SalesDepotID: string ... 9 more fields]\n"}]}},{"text":"%md\n## 50. Colaesce\nCoalesce combines existing partitions to avoid a full shuffle. It is often used to write data as a single csv.\nRead about it yourself on when to use it.\n\nHeads-up: You will need it later today!","user":"anonymous","dateUpdated":"2018-11-23T15:36:12+0000","config":{"tableHide":false,"editorSetting":{"language":"markdown","editOnDblClick":true,"completionSupport":false},"colWidth":12,"editorMode":"ace/mode/markdown","editorHide":true,"results":{},"enabled":true,"fontSize":9},"settings":{"params":{},"forms":{}},"results":{"code":"SUCCESS","msg":[{"type":"HTML","data":"<div class=\"markdown-body\">\n<h2>50. Colaesce</h2>\n<p>Coalesce combines existing partitions to avoid a full shuffle. It is often used to write data as a single csv.<br/>Read about it yourself on when to use it.</p>\n<p>Heads-up: You will need it later today!</p>\n</div>"}]},"apps":[],"jobName":"paragraph_1542979081468_913028027","id":"20171109-060543_493134394","dateCreated":"2018-11-23T13:18:01+0000","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:324","dateFinished":"2018-11-23T15:30:48+0000","dateStarted":"2018-11-23T15:30:48+0000"},{"user":"anonymous","dateUpdated":"2018-11-23T15:36:12+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","results":{},"enabled":true,"editorSetting":{"language":"scala","editOnDblClick":false,"completionKey":"TAB","completionSupport":true},"fontSize":9,"tableHide":false},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1542979081472_-1500037117","id":"20171109-085026_1636755805","dateCreated":"2018-11-23T13:18:01+0000","status":"READY","errorMessage":"","progressUpdateIntervalMs":500,"$$hashKey":"object:333"}],"name":"Data Processing in Spark","id":"2DWQU9D7S","noteParams":{},"noteForms":{},"angularObjects":{},"config":{"isZeppelinNotebookCronEnable":false,"looknfeel":"default","personalizedMode":"false"},"info":{}}