Skip to content

Commit

Permalink
Add support for Date & DateTime
Browse files Browse the repository at this point in the history
  • Loading branch information
dfdx committed May 31, 2022
1 parent 5826cdf commit 4ff15f9
Show file tree
Hide file tree
Showing 12 changed files with 92 additions and 82 deletions.
2 changes: 2 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ uuid = "e3819d11-95af-5eea-9727-70c091663a01"
version = "0.6.0"

[deps]
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
IteratorInterfaceExtensions = "82899510-4779-5014-852e-03e436cf321d"
JavaCall = "494afd89-becb-516b-aafa-70d2670c0337"
Reexport = "189a3867-3050-52da-a836-e630ba90ab69"
Expand All @@ -17,6 +18,7 @@ IteratorInterfaceExtensions = "1"
JavaCall = "0.7, 0.8"
Reexport = "1.2"
TableTraits = "1"
Umlaut = "0.2"
julia = "1.6"

[extras]
Expand Down
2 changes: 1 addition & 1 deletion docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ makedocs(
format = Documenter.HTML(),
modules = [Spark],
pages = Any[
"Introduction" => "intro.md",
"Introduction" => "index.md",
"SQL / DataFrames" => "sql.md",
"Structured Streaming" => "streaming.md",
"API Reference" => "api.md"
Expand Down
25 changes: 0 additions & 25 deletions docs/make_old.jl

This file was deleted.

12 changes: 7 additions & 5 deletions docs/src/api.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@

```@index
```@meta
CurrentModule = Spark
```

```@docs
@chainable
DotChainer
```

```@autodocs
Modules = [Spark]
Order = [:type, :function]
```@index
```
50 changes: 0 additions & 50 deletions docs/src/intro.md

This file was deleted.

45 changes: 45 additions & 0 deletions docs/src/sql.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,47 @@
```@meta
CurrentModule = Spark
```

# SQL / DataFrames

This is a quick introduction into the Spark.jl core functions. It closely follows the official [PySpark tutorial](https://spark.apache.org/docs/latest/api/python/getting_started/quickstart_df.html) and copies many examples verbatim. In most cases, PySpark docs should work for Spark.jl as is or with little adaptation.

Spark.jl applications usually start by creating a `SparkSession`:

```julia
using Spark

spark = SparkSession.builder.appName("Main").master("local").getOrCreate()
```

Note that here we use dot notation to chain function invocations. This makes the code more concise and also mimics Python API, making translation of examples easier. The same example could also be written as:

```julia
using Spark
import Spark: appName, master, getOrCreate

builder = SparkSession.builder
builder = appName(builder, "Main")
builder = master(builder, "local")
spark = getOrCreate(builder)
```

See [`@chainable`](@ref) for the details of the dot notation.


## DataFrame Creation


In simple cases, a Spark DataFrame can be created via `SparkSession.createDataFrame`. E.g. from a list of rows:

```julia
using Dates

df = spark.createDataFrame([
Row(a=1, b=2.0, c="string1", d=Date(2000, 1, 1), e=DateTime(2000, 1, 1, 12, 0)),
Row(a=2, b=3.0, c="string2", d=Date(2000, 2, 1), e=DateTime(2000, 1, 2, 12, 0)),
Row(a=4, b=5.0, c="string3", d=Date(2000, 3, 1), e=DateTime(2000, 1, 3, 12, 0))
])

```

2 changes: 1 addition & 1 deletion src/compiler.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
using JavaCall
import JavaCall: assertroottask_or_goodenv, assertloaded, get_method_id
import JavaCall: assertroottask_or_goodenv, assertloaded
using Umlaut

const JInMemoryJavaCompiler = @jimport org.mdkt.compiler.InMemoryJavaCompiler
Expand Down
21 changes: 21 additions & 0 deletions src/convert.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,25 @@
# Conversions #
###############################################################################

# Note: both - java.sql.Timestamp and Julia's DateTime don't have timezone.
# But when printing, java.sql.Timestamp will assume UTC and convert to your
# local time. To avoid confusion e.g. in REPL, try use fixed date in UTC
# or now(Dates.UTC)
Base.convert(::Type{JTimestamp}, x::DateTime) =
JTimestamp((jlong,), floor(Int, datetime2unix(x)) * 1000)
Base.convert(::Type{DateTime}, x::JTimestamp) =
unix2datetime(jcall(x, "getTime", jlong, ()) / 1000)

Base.convert(::Type{JDate}, x::Date) =
JDate((jlong,), floor(Int, datetime2unix(DateTime(x))) * 1000)
Base.convert(::Type{Date}, x::JDate) =
Date(unix2datetime(jcall(x, "getTime", jlong, ()) / 1000))


Base.convert(::Type{JObject}, x::Integer) = convert(JObject, convert(JLong, x))
Base.convert(::Type{JObject}, x::Real) = convert(JObject, convert(JDouble, x))
Base.convert(::Type{JObject}, x::DateTime) = convert(JObject, convert(JTimestamp, x))
Base.convert(::Type{JObject}, x::Date) = convert(JObject, convert(JDate, x))
Base.convert(::Type{JObject}, x::Column) = convert(JObject, x.jcol)

Base.convert(::Type{Row}, obj::JObject) = Row(convert(JRow, obj))
Expand All @@ -25,6 +42,8 @@ java2julia(::Type{JInteger}) = Int32
java2julia(::Type{JDouble}) = Float64
java2julia(::Type{JFloat}) = Float32
java2julia(::Type{JBoolean}) = Bool
java2julia(::Type{JTimestamp}) = DateTime
java2julia(::Type{JDate}) = Date
java2julia(::Type{JObject}) = Any

julia2ddl(::Type{String}) = "string"
Expand All @@ -33,6 +52,8 @@ julia2ddl(::Type{Int32}) = "int"
julia2ddl(::Type{Float64}) = "double"
julia2ddl(::Type{Float32}) = "float"
julia2ddl(::Type{Bool}) = "boolean"
julia2ddl(::Type{Dates.Date}) = "date"
julia2ddl(::Type{Dates.DateTime}) = "timestamp"


function JArray(x::Vector{T}) where T
Expand Down
1 change: 1 addition & 0 deletions src/core.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ using JavaCall
using Umlaut
import Umlaut.V
import Statistics
using Dates
# using TableTraits
# using IteratorInterfaceExtensions

Expand Down
3 changes: 3 additions & 0 deletions src/defs.jl
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ const JLong = @jimport java.lang.Long
const JFloat = @jimport java.lang.Float
const JDouble = @jimport java.lang.Double
const JBoolean = @jimport java.lang.Boolean
const JDate = @jimport java.sql.Date
const JTimestamp = @jimport java.sql.Timestamp

const JMap = @jimport java.util.Map
const JHashMap = @jimport java.util.HashMap
Expand All @@ -46,6 +48,7 @@ const JArraySeq = @jimport scala.collection.mutable.ArraySeq
const JSeq = @jimport scala.collection.immutable.Seq


toString(jobj::JavaObject) = jcall(jobj, "toString", JString, ())


###############################################################################
Expand Down
1 change: 1 addition & 0 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ spark = Spark.SparkSession.builder.


include("test_chainable.jl")
include("test_convert.jl")
include("test_compiler.jl")
include("test_sql.jl")

Expand Down
10 changes: 10 additions & 0 deletions test/test_convert.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
using Dates

@testset "Convert" begin
# create DateTime without fractional part
t = now(Dates.UTC) |> datetime2unix |> floor |> unix2datetime
d = Date(t)

@test convert(DateTime, convert(Spark.JTimestamp, t)) == t
@test convert(Date, convert(Spark.JDate, d)) == d
end

0 comments on commit 4ff15f9

Please sign in to comment.