Greenplum dba LEARNOVITA

[50+] Big Data Greenplum DBA Interview Questions and Answers

Last updated on 26th Sep 2022, Blog, Interview Question

About author

Sanjay (Sr Big Data DevOps Engineer )

Highly Expertise in Respective Industry Domain with 7+ Years of Experience Also, He is a Technical Blog Writer for Past 4 Years to Renders A Kind Of Informative Knowledge for JOB Seeker

(5.0) | 13265 Ratings 1598

1. The way to check the distribution policy of check table sales?


  • psql>d sales
  • Table” public. sales”
    id number
    date date

2. What number of user schemas are there within the database?


Use”dn” at the p sql prompt.

3. Once was my table last analyzed within the Greenplum database?



4. The way to check the dimensions of a table?


Table and Index:

  • P sql> choose pg_size_pretty(pg_total_relation_size(‘schema.tablename’));
  • Replace schema.tablename together with your search table

5. The way to start/stop decibel in admin mode?


Admin Mode Utility mode

The gpstart with choice (-R) stands for Admin mode or restricted mode wherever solely superusers will connect with the info once the info opens mistreatment this selection.

Utility mode permits you to attach to solely individual segments once started mistreatment gpstart -m, for example< to attach to solely master instance only:

  • PGOPTIONS=’-c gp_session_role=utility’ p sql

6. The way to check the Schema size?


  • P sql> choose schemaname ,round(sum(pg_total_relation_size(schemaname||’.’||tablename))/1024/1024) “Size_MB”
  • from pg_tables wherever schema name=’SCHEMA NAME’ cluster by 1;
  • [ connected Article: Analytics Tools in huge information ]

7. The way to check the info size?


To see the dimensions of the precise database:

  • P sql> choose pg_size_pretty(pg_database_size(‘DATBASE_NAME’)); Example:
  • gpdb=# choose pg_size_pretty(pg_database_size(‘gpdb’));
  • pg_size_pretty
  • 24 MB
  • (1 row)

To see all info sizes:

  • psql> choose datname,pg_size_pretty(pg_database_size(datname)) from pg_database;

8. The way to check partitioned off table size together with indexes and partitions?


Table size with partitions:

The following SQL provides you employee_dailly table size, which has partitions.

  • select schemaname,tablename,round(sum(pg_total_relation_size(schemaname || ‘.’ || partition table name))/1024/1024) “MB” from pg_partitions wherever table name=’employee_daily’ cluster by one,2;
    public employee_daily 254

9. However, do I purchase assistance on the syntax to change the table?


In pl sql session kind halter table which is able to show the syntax:

  • gpdb=# h alter table

10. Outline greenplum architecture?


11. The way to connect in utility mode?


From master host

  • PGOPTIONS=’-c gp_session_role=utility’ p sql -p -h


  • port is segment/ master info port.
  • hostname is segment/master hostname.

12. Where/How to seek out decibel logs?


Master:Master gp db log file is found within the $MASTER_DATA_DIRECTORY/pg_log/ directory and also the file name depends on the info “log_filename” parameter.

  • $MASTER_DATA_DIRECTORY/pg_log/gpdb-yyyy-mm-dd_000000.csv –>Log file format with default installation.
  • ~pgadmin/gpAdminLogs/ –>gpstart,gpstop,gpstate and alternative utility logs.

Segments:Primary segments run below SQL to visualize log file location

  • select db id,hostname,datadir||’/pg_log’ from gp_configuration wherever content not in (-1) and is primary is true;

Mirror Segments run below SQL to visualize log file location:

  • select db id,hostname,datadir||’/pg_log’

13. The way to see the list of obtainable functions in Greenplum DB?


  • df schema name.function name (schemaname perform|and performance} name support wildcard characters)
  • test=# df pub*.*test*
  • List of operate
    Schema NameResult information kind
    public bugtest number
    public check mathematician
    public check void

14. The way to check whether or not Greenplum server is up and running?


The gpstate is the utility to examine gp db standing.Use gpstate -Q to indicate a fast standing. confer with gpstate –help for additional choices.

Sample output:

  • [gpadmin@stinger2]/export/home/gpadmin>gpstate -Q
  • gpadmin-[INFO]:-Obtaining GPDB array kind, [Brief], please wait…
  • gpadmin-[INFO]:-Obtaining GPDB array kind, [Brief], please wait…
  • gpadmin-[INFO]:-Quick Greenplum info standing from Master instance solely
  • gpadmin-[INFO]:———————————————————-
  • gpadmin-[INFO]:-GPDB fault action worth = readonly
  • gpadmin-[INFO]:-Valid count in standing read = four
  • gpadmin-[INFO]:-Invalid count in standing read = zero
  • gpadmin-[INFO]:———————————————————-

15. The way to recover the associate degree invalid segment?


Without “-F ” choice – the primary files are going to be compared, the distinction found and solely totally different files are going to be synched (the -first stage may last a protracted time if there are too several files within the information directory). while not the “-F” choice – The modification trailing log is going to be sent and applied to the mirror.

With the “-F” choice – the whole information directory is going to be resynched. With the “-F” choice – the whole information directory is going to be resynched.

16. The way to produce a Database?


There are 2 ways that to make a gpdb info mistreatment psql session or the Greenplum createdb utility:

Using p sql session:

  • gpdb=# h produce the info

Command: Produce info

Description: Produce a replacement info


  • CREATE info name [ [ WITH ] [ OWNER [=] db_owner ] [ templet [=] templet ] [ secret writing [=] secret writing ] [
  • TABLESPACE [=] tablespace ] [ affiliation LIMIT [=] connlimit ] ] mistreatment create db utility:
  • Usage: $GPHOME/bin/createdb –help


  • -D, –tablespace=TABLESPACE default tablespace for the info
  • -E, –encoding=ENCODING secret writing for the info
  • -O, –owner=OWNER info user to possess the new info
  • -T, –template=TEMPLATE templet info to repeat

17. However, do I purchase a listing of databases during a Greenplum cluster?


  • gpdb=# l (lowercase letter “l”)

List of databases

    NameOwnerSecret writing
    gp db gp admin UTF8
    P perfmon gp admin UTF8
    postgres gp admin UTF8
    template0 gp admin UTF8
    template1 gp admin UTF8

Check below SQL for additional details on dbs.

  • gpdb=# choose * from pg_database;

18. The way to delete/drop associate degree existing info in Greenplum?


  • gpdb=# h DROP info
  • Command: DROP info
  • Description: take away a info
  • Syntax:DROP info [ IF EXISTS ] name
  • Also check drop db utility:
  • $GPHOME/bin/dropdb –help
  • drop db removes PostgreSQL info.
  • Usage:
  • dropdb [OPTION]… DBNAME

19. Where am I able to get assistance on Postgres p sql commands?


In p sql session

  • “ ?” – for all pl sql session facilitate
  • “h ” For any SQL syntax.

20. What’s the greenplum database?


21. gpstart unsuccessful what ought to I do?


  • Check upstart log come in ~pgadmin/gpAdminLogs/gpstart_yyyymmdd.log
  • Take a glance at the pg begin log file for a lot of details in
  • $MASTER_DATA_DIRECTORY/pg_log/startup.log

22. Why would we like gp stop -m and gpstart -m?


The gpstart -m command permits you to begin the master solely and with none of the info segments and is employed primarily by support to induce system level information/configuration. associate users wouldn’t often or maybe commonly use it.

23. What’s the procedure to induce elimination of mirror segments?


There aren’t any utilities on the market to get rid of mirrors from Greenplum. you would like to create positive all primary segments square measure smart then you’ll take away the mirror configuration from gp_configuration in three.x.

24. A way to run gp checkcat?


The gp checkcat tool is employed to envision catalog inconsistencies between master and segments. It may be found within the $GPHOME/bin/lib directory:

  • Usage: gp checkcat
  • [db name]
  • -?
  • -B parallel: variety of employee threads
  • -g dir : generate SQL to rectify catalog corruption, place it in dir
  • -h host : decibel hostname
  • -p port : decibel port variety
  • -P passwd : decibel parole
  • -o : check OID consistency
  • -U uname : decibel User Name
  • -v : long-winded


  • gp checkcat gp db >gpcheckcat_gpdb_logfile.

25. What’s the distinction between pg_dump and gp_dump?


    pg_ dump gp_ dump
    pg_dump – Non-parallel backup utility, you would like an enormous filing system wherever the backup is created within the master node solely. gp_dump – Parallel backup utility. The backups are created within the master and segments filing system.

26. What’s a medico detective and the way I run it in Greenplum?


The detective utility collects data from a running Greenplum information system and creates a bzip2-compressed tar computer file. This computer file helps with the designation of Greenplum information errors or system failures. for a lot of details check facilitation. medico detective –help

27. A way to delete a standby?


To remove the presently designed standby master host from your Greenplum information system, run the subsequent command within the master only: # gpinitstandby -r

28. A way to re-sync a standby?


Use this feature if you have already got a standby master designed, and simply wish to resynchronize the info between the first and backup master host. The Greenplum system catalog tables won’t be updated. # gpinitstandby -n (resynchronize).

29. A way to add mirrors to the array?


The gp add mirrors utility configures mirror phase instances to associate existing Greenplum information system that was at first designed with primary phase instances solely.

For a lot of details check facilitation.

  • # gp add mirrors –help

30. Outline Greenplum end point architecture?


31. A way to see primary to mirror mapping?


From the information catalog following question list configuration on content ID, you’ll find out primary and mirror for every content.

gpdb=# choose * from gp_configuration order by content. Note: ranging from GPDB four.x, gp_segment_configuration table is employed instead. gpdb=# choose * from gp_segment_configuration order by db id;

32. A way to run gp checkperf IO/netperf?


Create a directory wherever you have got free area and customary altogether hosts.

For network I/O check for every nic card:

gp checkperf -f seg_host_file_nic-1 -r N -d /data/gpcheckperf > seg_host_file_nic_1.outgpcheckperf -f seg_host_file_nic-2 -r N -d /data/gpcheckperf > seg_host_file_nic_2.out

For disk I/O:

33. A way to update postgresql.conf and reload it?


In GP 4.0 version check gp config utility to vary postgresql.conf parameters. In the 3.X version manually modify parameters in postgres.conf for a lot of details check Greenplum Administrator’s Guide.

34. A way to manage pg_hba.conf?


The pg_hba.conf file of the master instance controls consumer access and authentication to your Greenplum system. Check Greenplum Administrator’s Guide for directions to add/change contents of this file.

35. However would you implement compression and probably justify the compression types?


Table level Column level

Table-level compression is applied to a complete table. Column-level compression is applied to a particular column. you’ll apply completely different|completely different} column-level compression algorithms to different columns.

36. A way to add a replacement user to the database?


  • Use produce user utility to make users. See produce user –help for a lot of details.
  • You can additionally use SQL commands in the p sql prompt to make users.
  • For example: produce USER or ROLE

37. A way to produce a password-free sure env b/w all the phase hosts?


Use gpssh-exkeys

38. A way to check decibel version and version at init DB?


To check the version:

  • P sql> choose version();
  • or
  • postgres –gp-version
  • To check medico version at install:
  • P sql> choose * from gp_version_at_initdb;

39. What’s a vacuum and once ought to I run this?


VACUUM reclaims storage occupied by deleted tuples. In traditional GPDB operation, tuples that square measure deleted or obsoleted by associate update aren’t physically off from their table.

40. Associate introduction to greenplum?


42. What square measure resource queues?


Resource queues square measure accustomed to Greenplum information employment management. All user/queries may be prioritized victimization Resource queues. see Admin guide for a lot of details.

43. What’s gp_toolkit?


The gp_toolkit could be an information_schema, that has several tables, views and functions to better manage Greenplum information once decibel is up. In 3.x earlier versions, it had been spoken as gp_jetpack.

44. What square measures the main variations between Oracle and Greenplum?


Oracle could be an on-line database where as Greenplum is MPP nature.

45. Tell ME a number of the aspects/implementations/configurations you’ve got tired Greenplum?


A configuration like port modification, Active directory authentication, pg_hba.conf changes, postgresql.conf changes, gp f dist, etc.

46. What parameters are you able to use to manage work in a very Greenplum database?


Workload management is finished by making resource queues and assignment varied limits.

47. However would you troubleshoot AN issue/error/problem once there’s nobody on the market to assist you otherwise you area unit all by yourself?


Look at the log files. querying the doctor perfmon and tools schema for varied knowledge and statistics.

48. What’s sensible and unhealthy concerning the Greenplum, compared to Oracle and Greenplum?


Greenplum is constructed on the height of Postgresql. it’s a shared-nothing, MPP design best for knowledge storage env. sensible for giant knowledge analytics functions. Oracle is AN general-purpose info.

49. What’s greenplum text mining data?


50. What would you be doing to assemble statistics within the database? yet to reclaim the space?



A VACUUM FULL can reclaim all expired row houses, however could be a terribly pricey operation and should take AN intolerably while to complete on giant, distributed Greenplum info tables.

A VACUUM FULL isn’t suggested in Greenplum info.

51. What would {you do|you area unit doing} once a user or users are protesting that a specific question is running slow?


  • Look at the question arrange
  • Look at the network performance
  • Look at the resource queues
  • Look at the interconnect performance
  • Look at the question itself i.e. if it are often written in a very additional economical manner

52. That choice would you employ to export the DDL of the info or table?


s (-s | –schema-only Dump solely the article definitions (schema), not data.)

53. Once you restore from a backup taken from gp_dump, are you able to import a table?


NO. affirmative if throughout the gp_dump you secured one table solely.

54. What’s the distinction Between Vacuum And Vacuum Full?


Unless you wish to come back house to the OS in order that different tables or different components of the system will use that house, you ought to use VACUUM VACUUM FULL is barely required once you have a table that’s largely dead rows, that is, the overwhelming majority of its contents are deleted. Even then, there’s no purpose to mistreat VACUUM FULL.

55. My SQL question is running terribly slow. It absolutely was running fine yesterday. What ought to I do?


  • Check that your association to the Greenplum cluster remains smart if you’re employing a remote consumer. you’ll do that by running the SQL domestically to the MD cluster.
  • Check that the system tables and user tables concerned aren’t distended or inclined. browse jetpack or Greenplum toolkit documentation concerning the way to do that.
  • Check with your DBA that the Greenplum interconnect remains performing arts properly.
  • This can be done by checking for born packets on the interconnect “netstat -i” and by running gp checkperf. it’s conjointly attainable that a section is experiencing hardware issues, which may be found within the output of d mesg or in.
  • cd $MASTER_DATA_DIRECTORY – Master directory.
  • cd $MASTER_DATA_DIRECTORY/pg_logs — Master

56. What are the backup choices offered at OS level?


Solaris: zfs snapshots at the classification system level.

All OS: gp crondump / gp_dump.

class=”blog-title-4″>57. What’s gp crondump?


Wrapper utility for gp_dump, which may be referred to as directly or from a crontab entry.

Example: gp crondump -x

58. However do I clone my production info to the PreProd / QA environment?


  • Prod and QA on identical GPDB clusters, use produce info model.
  • If Prod and QA are on totally different clusters, use backup and restore utilities.

59. What’s the associate introduction Greenplum ETL Tool?


60. What are the tools offered in Greenplum to require backup and restores?


For non-parallel backups: Another helpful command for obtaining knowledge out of the info is the COPY too.

For parallel backups: gp_dump and gp crondump for backups and gp_restore for restore method.

61. The way to generate DDL for a table?


Use pg_dump utility to come up with DDL.


  • pg_dump -t njonna.accounts -s -f ddl_accounts.sql


  • -f ddl_accounts.sql is an associated computer file.
  • -s dump solely schema no knowledge

62. What’s an MPP database?


An MPP info may be a info that’s optimized to be processed in parallel several|for several} operations to be performed by many process units at a time. MPP (massively parallel process) is the coordinated processing of a program by multiple processors engaged on totally different elements of the program.

63. Is Hadoop a knowledge lake?


To put it merely, Hadoop may be a technology that may be wont to build knowledge lakes. A knowledge lake is associated with design, whereas Hadoop may be a part of that design. In other words, Hadoop is the platform for knowledge lakes.

64. That library is employed by Greenplum for info analytics?


Apache MADlib is an ASCII text file library for scalable in-database analytics. The Greenplum MADlib extension provides the flexibility to run machine learning and deep learning workloads in an exceedingly Greenplum info.

65. Will Greenplum store unstructured data?


Pivotal Greenplum five.0 entails many enhancements for semi-structured/unstructured processing as well as the GPText extension, the JSON knowledge kind, and improved XML data type support.

66. What’s knowledge lake storage?


A data lake may be a storage repository that holds a massive quantity of information in its native format till it’s required for analytics applications. whereas a standard knowledge warehouse stores knowledge in gradable dimensions and tables, {a knowledge|a knowledge|and information} lake uses a flat design to store data, primarily in files or object storage.

67. What sort of info is Greenplum?


Greenplum info may be a massively data processing (MPP) info server with associate design specially designed to manage large-scale analytic knowledge warehouses and business intelligence workloads.

68. What’s Greenplum Hadoop?


It is a massively data processing (MPP) info server with associate design specially designed to manage large-scale analytic knowledge warehouses and business intelligence workloads. It supported PostgreSQL ASCII text file technology.

69. What’s greenplum UAP components?


70. What’s the distinction between PostgreSQL and Greenplum?


Greenplum is each a knowledge warehouse and transactional or operational data store. Greenplum as an info tool employs a shared-nothing design as compared to PostgreSQL.

71. What’s the position Of Pg_hba/logfile/master_data_directory?


  • cd $MASTER_DATA_DIRECTORY – Master directory.
  • pg_hba.conf and postgres.conf location and totally different GPDB internal directories.
  • cd $MASTER_DATA_DIRECTORY/pg_logs — Master info log files location

72. What are Greenplum Performance Monitor and also the thanks to Install?


It is an observation tool that collects statistics on system and question performance and builds historical data.

73. Are you ready to justify the strategy of knowledge Migration From Oracle To Greenplum?


Their unit of measurement in some ways. Simplest steps unit of measurement Unload data into csv files, manufacture tables in greenplum info comparable to Oracle, manufacture external table, begin gp f dist inform to external table location, Load data into greenplum. you’ll put together the gp load utility. transfer creates an external table at runtime.

74. What’s Multi-version control?


Multi-version concurrency management or MVCC is employed to avoid spare protection of the info. This removes the postponement for the user to log into his info. This feature or postponement happens once somebody else is accessing the content. All transactions are unbroken as a record.

75. What are the new characteristics of PostgreSQL nine.1?


During the method of changing the project, one will ne’er make certain that options can enter and which of them won’t build the cut. The project has precise and demanding standards for quality, and a few patches could or might not match them before the set point.

76. The way to activate temporal arrangement, And Checking what quantity Time a question Takes To Execute?


  • You can flip in temporal arrangement per session before you run your SQL with the temporal arrangement command.
  • You can run justify analysis against your SQL statement to induce the temporal arrangement.

77. Is polar greenplum free?


As mentioned throughout this post, Greenplum is AN open supply information therefore the community version is completely liberal to transfer and use.

78. The way to See the worth Of Guc?


By connecting the GPDB information victimization p sql question catalog or do show parameters.


  • gp db# choose name,setting from pg_settings wherever name=’GUC’;
  • gp db# show

79. What’s greenplum physical architecture?


An external table may be a Greenplum information table backed with information that resides outside of the information. You produce a legible external table to scan information from the external information supply and build a writable external table to jot down information to the external supply.

80. What square measures external tables in Greenplum?


An external table may be a Greenplum information table backed with information that resides outside of the information. You produce a legible external table to scan information from the external information supply and build a writable external table to jot down information to the external supply.

81. However, do I produce an external table in Greenplum?


CREATE EXTERNAL net TABLE log_output (linenum int, message text) EXECUTE ‘/var/load_scripts/’ ON HOST FORMAT ‘TEXT’ (DELIMITER ‘|’); produce a writable external table named sales_out that uses gp f dist to jot down output information to a file named sales. out .

82. What’s Gpfdist in Greenplum?


Gpf dist is Greenplum information parallel file distribution program. it’s utilized by legible external tables and gp load to serve external table files to any or all Greenplum information segments in parallel.

Are you looking training with Right Jobs?

Contact Us

Popular Courses