Initial configuration v2
Before creating the extension and the database objects that use the extension, you must modify the Postgres host, providing the location of the supporting libraries.
After installing Postgres, modify postgresql.conf
, located in:
/var/lib/edb/as_version/data
Modify the configuration file, adding the hdfs_fdw.jvmpath
parameter to the end of the configuration file and setting the value to specify the location of the Java virtual machine (libjvm.so
). Set the value of hdfs_fdw.classpath
to indicate the location of the Java class files used by the adapter. Use a colon (:) as a delimiter between each path. For example:
Note
Copy the jar files (hive-jdbc-1.0.1-standalone.jar
and hadoop-common-2.6.4.jar
) from the respective Hive and Hadoop sources or website to the PostgreSQL instance where Hadoop Foreign Data Wrapper is installed.
If you're using EDB Postgres Advanced Server and have a DATE
column in your database, you must set edb_redwood_date = OFF
in the postgresql.conf
file.
After setting the parameter values, restart the Postgres server. For detailed information about controlling the service on an EDB Postgres Advanced Server host, see the EDB Postgres Advanced Server documentation.
Before using the Hadoop Foreign Data Wrapper:
- Use the CREATE EXTENSION command to create the extension on the Postgres host.
- Use the CREATE SERVER command to define a connection to the Hadoop file system.
- Use the CREATE USER MAPPING command to define a mapping that associates a Postgres role with the server.
- Use the CREATE FOREIGN TABLE command to define a table in the EDB Postgres Advanced Server database that corresponds to a database that resides on the Hadoop cluster.
CREATE EXTENSION
Use the CREATE EXTENSION
command to create the hdfs_fdw
extension. To invoke the command, use your client of choice (for example, psql) to connect to the Postgres database from which you will be querying the Hive or Spark server, and invoke the command:
Parameters
IF NOT EXISTS
Include the IF NOT EXISTS
clause to instruct the server to issue a notice instead of returning an error if an extension with the same name already exists.
schema_name
Optionally specify the name of the schema in which to install the extension's objects.
Example
The following command installs the hdfs_fdw
Hadoop Foreign Data Wrapper:
CREATE EXTENSION hdfs_fdw;
For more information about using the foreign data wrapper CREATE EXTENSION
command, see the PostgreSQL documentation.
CREATE SERVER
Use the CREATE SERVER
command to define a connection to a foreign server. The syntax is:
The role that defines the server is the owner of the server. Use the ALTER SERVER
command to reassign ownership of a foreign server. To create a foreign server, you must have USAGE
privilege on the foreign-data wrapper specified in the CREATE SERVER
command.
Parameters
server_name
Use server_name
to specify a name for the foreign server. The server name must be unique in the database.
FOREIGN_DATA_WRAPPER
Include the FOREIGN_DATA_WRAPPER
clause to specify for the server to use the hdfs_fdw
foreign data wrapper when connecting to the cluster.
OPTIONS
Use the OPTIONS
clause of the CREATE SERVER
command to specify connection information for the foreign server. You can include:
Option | Description |
---|---|
host | The address or hostname of the Hadoop cluster. The default value is localhost . |
port | The port number of the Hive Thrift Server or Spark Thrift Server. The default is 10000 . |
client_type | Specify hiveserver2 or spark as the client type. To use the ANALYZE statement on Spark, you must specify a value of spark . The default value is hiveserver2 . |
auth_type | The authentication type of the client. Specify LDAP or NOSASL . If you don't specify an auth_type , the data wrapper decides the auth_type value on the basis of the user mapping. If the user mapping includes a user name and password, the data wrapper uses LDAP authentication. If the user mapping doesn't include a user name and password, the data wrapper uses NOSASL authentication. |
connect_timeout | The length of time before a connection attempt times out. The default value is 300 seconds |
enable_aggregate_pushdown | Similar to the table-level option but configured at the server level. If true , pushes the aggregate operations to the foreign server instead of fetching rows from the foreign server and performing the operations locally. You can also set this option for an individual table. The table-level value of the option takes precedence over the server-level option value. Default is true . |
enable_join_pushdown | Similar to the table-level option but configured at the server level. If true , pushes the join between two foreign tables from the same foreign server instead of fetching all the rows for both the tables and performing a join locally. You can also set this option for an individual table and, if any of the tables involved in the join has set the option to false , then the join isn't pushed down. The table-level value of the option takes precedence over the server-level option value. Default is true . |
enable_order_by_pushdown | Similar to the table-level option but configured at the server level. If true , pushes the order-by operation to the foreign server instead of fetching rows from the foreign server and performing the sort locally. You can also set this option for an individual table. The table-level value of the option takes precedence over the server-level option value. Default is true . |
fetch_size | Provided as a parameter to the JDBC API setFetchSize . The default value is 10,000 . |
log_remote_sql | If true , logging includes SQL commands executed on the remote Hive server and the number of times that a scan is repeated. The default is false . |
query_timeout | Use query_timeout to provide the number of seconds after which a request times out if it isn't satisfied by the Hive server. Query timeout is not supported by the Hive JDBC driver. |
use_remote_estimate | Include use_remote_estimate to instruct the server to use EXPLAIN commands on the remote server when estimating processing costs. By default, use_remote_estimate is false , and remote tables are assumed to have 1000 rows. |
Example
The following command creates a foreign server named hdfs_server
that uses the hdfs_fdw
foreign data wrapper to connect to a host with an IP address of 170.11.2.148
:
The foreign server uses the default port (10000) for the connection to the client on the Hadoop cluster. The connection uses an LDAP server.
For more information about using the CREATE SERVER
command, see the PostgreSQL documentation.
CREATE USER MAPPING
Use the CREATE USER MAPPING
command to define a mapping that associates a Postgres role with a foreign server:
You must be the owner of the foreign server to create a user mapping for that server.
Note
The Hadoop Foreign Data Wrapper supports NOSASL and LDAP authentication. If you're creating a user mapping for a server that uses LDAP authentication, use the OPTIONS
clause to provide the connection credentials (the user name and password) for an existing LDAP user. If the server uses NOSASL authentication, omit the OPTIONS
clause when creating the user mapping.
Parameters
role_name
Use role_name
to specify the role to associate with the foreign server.
server_name
Use server_name
to specify the name of the server that defines a connection to the Hadoop cluster.
OPTIONS
Use the OPTIONS
clause to specify connection information for the foreign server. If you're using LDAP authentication, provide:
username
— The name of the user on the LDAP server.
password
— the password associated with the username.
If you don't provide a user name and password, the data wrapper uses NOSASL authentication.
Example
The following command creates a user mapping for a role named enterprisedb
. The mapping is associated with a server named hdfs_server
:
CREATE USER MAPPING FOR enterprisedb SERVER hdfs_server;
If the database host uses LDAP authentication, provide connection credentials when creating the user mapping:
The command creates a user mapping for a role named enterprisedb
that is associated with a server named hdfs_server
. When connecting to the LDAP server, the Hive or Spark server authenticates as alice
, and provides a password of 1safepwd
.
For detailed information about the CREATE USER MAPPING
command, see the PostgreSQL documentation.
CREATE FOREIGN TABLE
A foreign table is a pointer to a table that resides on the Hadoop host. Before creating a foreign table definition on the Postgres server, connect to the Hive or Spark server and create a table. The columns in the table map to columns in a table on the Postgres server. Then, use the CREATE FOREIGN TABLE
command to define a table on the Postgres server with columns that correspond to the table that resides on the Hadoop host. The syntax is:
column_constraint
is:
table_constraint
is: