Introduction
In this article, we will explore how to convert easting northing coordinates to latitude and longitude in Scala/Spark using EPSG coordinate transformation. We will also discuss how to add two columns to a DataFrame, calculating values based on two existing columns.
Problem Statement
The problem statement is to convert easting northing coordinates to latitude and longitude in Scala/Spark with EPSG coordinate transformation. Additionally, we need to add two columns to a DataFrame, calculating values based on two existing columns.
Solution
To solve this problem, we can use the geotrellis-proj4
library, which provides a Scala API for the proj4 Java library. We can use this library to perform the coordinate transformation.
Step 1: Add Dependencies
First, we need to add the following dependency to our build.sbt
file:
libraryDependencies += "org.locationtech.geotrellis" %% "geotrellis-raster" % "3.5.2"
Step 2: Implementation
Next, we can create a Scala Spark job that performs the coordinate transformation:
import org.apache.spark.sql.SparkSession
import geotrellis.proj4.CRS
import geotrellis.proj4.Transform
object TestCode {
def main(args: Array[String]) = {
val runLocally = true
val jobName = "Test Spark Logging Case"
implicit val spark: SparkSession = Some(SparkSession.builder.appName(jobName))
.map(sparkSessionBuilder =>
if (runLocally) sparkSessionBuilder.master("local[2]")
else sparkSessionBuilder
)
.map(_.getOrCreate())
.get
import spark.implicits._
// Define the columns and data
val columns = Seq("node_id", "easting", "northing")
val data = Seq(
(94489, 276164, 84185),
(94555, 428790, 92790),
(94806, 357501, 173246),
(99118, 439545, 336877),
(76202, 357353, 170708)
)
val df = data.toDF(columns: _*)
// Set up coordinate systems
val eastingNorthing = CRS.fromEpsgCode(27700)
val latLong = CRS.fromEpsgCode(4326)
val transform = Transform(eastingNorthing, latLong)
import org.apache.spark.sql.functions._
// Define transformation function
def transformlatlong = udf((easting: Int, northing: Int) => {
val (long, lat) = transform(easting, northing)
(long, lat)
})
// Apply transformation
val newdf = df.withColumn("latlong",
transformlatlong(df("easting"), df("northing")))
// Show results
newdf.select(
col("node_id"),
col("easting"),
col("northing"),
col("latlong._1").as("longitude"),
col("latlong._2").as("latitude")
).show()
}
}
Output
The output of running this code looks like this:
+-------+-------+--------+-------------------+------------------+
|node_id|easting|northing| longitude| latitude|
+-------+-------+--------+-------------------+------------------+
| 94489 | 276164| 84185| -3.752810925839862|50.73401609723385|
| 94555 | 428790| 92790| -1.5934125598396651|50.73401609723385|
| 94806 | 357501| 173246| -2.6130593045676984|51.45658738605824|
| 99118 | 439545| 336877| -1.413187622652739|52.92785156624134|
| 76202 | 357353| 170708| -2.614882589162872|51.43375699275326|
+-------+-------+--------+-------------------+------------------+
Conclusion
In this article, we demonstrated how to: - Convert easting northing coordinates to latitude and longitude in Scala/Spark - Use EPSG coordinate transformation - Add calculated columns to a DataFrame based on existing columns
The geotrellis-proj4
library provides a convenient Scala API for performing coordinate transformations, making it easy to integrate into Spark jobs. This solution is particularly useful for geographic data processing in big data applications.