Destination Databricks Delta Lake #
The extracted replicant-cli
will be referred to as the $REPLICANT_HOME
directory in the proceeding steps.
I. Setup Connection Configuration #
-
From
$REPLICANT_HOME
, navigate to the sample connection configuration file:vi conf/conn/databricks.yaml
-
Make the necessary changes as follows:
type: DATABRICKS_DELTALAKE host: localhost #Replace localhost with your Databricks host port: 43213 #Replace 43213 with the port of your Databricks cluster url: "jdbc:spark://<host>:<port>/<database-name>;transportMode=http;ssl=1;httpPath=<http-path>;AuthMech=3" #This url can be copied from databricks cluster info page username: "replicant" #Replace replicant with the user that connects to your Databricks server password: "Replicant#123" #Replace Replicant#123 with your user's password max-connections: 30 #Maximum number of connections Replicant can open in Databricks #lob-store-path: "/LOB_STORAGE" stage: #Note: You must use DATABRICKS_DBFS or an external stage like S3 to hold the data files type: S3 | DATABRICKS_DBFS #Specify your stage type root-dir: "replicate-stage/databricks-stage" #Specify the path to a directory in S3 which can be used to stage bulk-load files conn-url: "replicate-stage" #Specify the conn-url for your stage; For S3 replace replicate-stage with your bucket name #For DATABRICKS_DBFS only use-credentials: true|false #Default is false; When true, you must set host, port, username, and password in the connection configuration section #For S3 only: key-id: "<S3 access key>" #Replace <S3 access key> with your S3 access key secret-key: "<S3 secret key>" #Replace <S3 secret key> with your S3 secret key max-retries: 100 #Enter the maximum number of times Replicant can re-attempt a failed operation retry-wait-duration-ms: 1000 #Enter the time Replicant should wait between each re-try of a failed operation
II. Setup Applier Configuration #
- From
$REPLICANT_HOME
, navigate to the applier configuration file:vi conf/dst/databricks.yaml
- Make the necessary changes as follows:
snapshot: threads: 16 #Maximum number of threads Replicant should use for writing to the target #If bulk-load is used, Replicant will use the native bulk-loading capabilities of the target database bulk-load: enable: true|false #Set to true if you want to enable bulk loading type: PIPE|FILE #Specify the type of bulk loading between FILE and PIPE serialize: true|false #Set to true if you want the generated files to be applied in serial/parallel fashion realtime: threads: 4 #Maximum number of threads Replicant should use for writing to the target