magpie package

Contents

The Python API for Magpie.

Public classes:
  • MagpieContext: Main entry point for Magpie functionality.

  • MagpieNotebookContext: A context object with helper methods for interacting with Magpie in a notebook setting.

  • MagpieError: An error used to wrap java exceptions thrown by Magpie commands.

  • MagpieInfo: Information about a Magpie session.

  • MagpieRows: A collection of rows returned from a Magpie command.

  • MagpieVariables: A class used for accessing variables.

  • MagpieSecrets: A class used for accessing secrets

class magpie.MagpieContext(jmc, spark)

The main entry point for Magpie in python.

The MagpieContext is used to execute commands, run SQL queries, get data frames for Magpie tables, and more.

Available as mc in python script tasks and python notebook blocks.

Example usage:

>>> flavors = ["apple", "banana", "strawberry"]
>>> df = mc.getTableDataFrame("store_sales")
>>> for f in flavors:
>>>   df.filter(df.flavor == f).createOrReplaceTempView("filtered_sales")
>>>   mc.sql("select * from filtered_sales")
>>>   mc.execute("save result as table %s_sales" % f)
about()

Return information about the current Magpie session

>>> info = mc.about()
>>> info.organization
'Silectis'
Returns

current session MagpieInfo

clearVariable(name)

This function is deprecated and will be removed in a future Magpie release. Use mc.variables.remove(name) instead.

Unset a particular variable in the context.

>>> mc.setVariable("age", 5)
>>> mc.clearVariable("age")
>>> mc.hasVariable("age")
False
Parameters

name – Variable name

clearVariables()

This function is deprecated and will be removed in a future Magpie release. Use mc.variables.clear() instead.

Clear all variables in the context.

>>> mc.setVariable("age", 5)
>>> mc.clearVariables()
>>> mc.hasVariable("age")
False
execute(command)

Execute a single command and block until it completes

>>> mc.execute("save result as table people")
Parameters

command – Command string to execute

Returns

Command result. Either None, a String (JSON), or MagpieRows.

getTableDataFrame(table)

Get the data frame for a Magpie table, optionally qualified by schema.

>>> df = mc.getTableDataFrame("people")
>>> df.show()
+-----+----+
| name| age|
+-----+----+
|Alice|   2|
|  Bob|   5|
+-----+----+
Parameters

table – table name

Returns

table pyspark.sql.DataFrame

getVariable(name)

This function is deprecated and will be removed in a future Magpie release. Use mc.variables.get(name) instead.

Get the value of a variable.

>>> mc.setVariable("age", 5)
>>> mc.getVariable("age")
5
Parameters

name – Variable name

Returns

Variable value

hasVariable(name)

This function is deprecated and will be removed in a future Magpie release. Use mc.variables.exists(name) instead.

Determine whether a variable is defined on the context.

>>> mc.setVariable("age", 5)
>>> mc.hasVariable("age")
True
Parameters

name – Variable name

Returns

Whether the variable is defined

interpret(command)

Execute the provided command and render the result visually

Parameters

command – Command to execute

For example, to render the first 100 rows of the table people:

>>> mc.interpret("show 100 from people")
printVariables()

This function is deprecated and will be removed in a future Magpie release.

Get a listing of all variables defined on the context and their values.

>>> mc.setVariable("age", 5)
>>> mc.setVariable("color", "red")
>>> mc.printVariables()
  age = 5
  color = red
Returns

Variable listing

profile(df)

Profile a data frame and render the profile visually

Parameters

dfpyspark.sql.DataFrame to profile

>>> mc.profile(df)
result()

Get the last result as a spark data frame

>>> mc.sql("select * from people")
>>> df = mc.result()
>>> df.show()
+-----+----+
| name| age|
+-----+----+
|Alice|   2|
|  Bob|   5|
+-----+----+
Returns

last result pyspark.sql.DataFrame

setVariable(name, value)

This function is deprecated and will be removed in a future Magpie release. Use mc.variables.set(name, value) instead.

Set a variable to the given value in the context.

>>> mc.setVariable("age", 5)
>>> mc.getVariable("age")
5
Parameters
  • name – Variable name

  • value – Variable value

sql(sql)

Execute a SQL command, returning the resulting data frame.

>>> df = mc.sql("select distinct name from people")
>>> df.collect()
['Alice', 'Bob']
Parameters

sql – SQL statement

Returns

pyspark.sql.DataFrame result

substituteVariables(input)

This function is deprecated and will be removed in a future Magpie release.

Note: substitution is performed automatically on arguments passed to execute() and sql().

Substitute any variables present in the input string with current values stored in the context.

An exception is thrown if any variables are not able to be matched.

>>> mc.setVariable("age", 5)
>>> mc.substituteVariables("select * from people where age = $#age")
select * from people where age = 5
Parameters

input – Input string

Returns

String with variables substituted for their values

exception magpie.MagpieError(java_exception)

Exception propagated from Java when performing a Magpie action.

class magpie.MagpieInfo(info)

Information about a Magpie session.

property cluster

Current cluster name, if set

property instance

Cluster instance name

property name

Name of the application (Magpie)

property organization

Current organization name, if set

property project

Current project name, if set

property repository

Current repository name, if set

property schema

Current schema name, if set

property user

Current user name, if set

property version

Application version

class magpie.MagpieNotebookContext(mc)

Magpie context wrapper with notebook utility methods.

Available as magpie in python blocks in the notebook.

interpret(command)

This function is deprecated and will be removed in a future Magpie release. Use mc.interpret(command) instead.

Execute the provided command and render the result visually in the notebook

Parameters

command – Command to execute

For example, to render the first 100 rows of the table people in the notebook:

>>> magpie.interpret("show 100 from people")
profile(df)

This function is deprecated and will be removed in a future Magpie release. Use mc.profile(command) instead.

Profile a data frame and render the profile visually in the notebook

Parameters

dfpyspark.sql.DataFrame to profile

>>> magpie.profile(df)
class magpie.MagpieRows(rows, totalCount)

A collection of rows returned from a Magpie command, optionally including the total count of the source data set.

When a total count is requested:

>>> res = mc.execute("show 2 from people with count")
>>> res.rows
[Row(age=2, name='Alice'), Row(age=5, name='Bob')]
>>> res.totalCount
100

And when a total count is not requested:

>>> res2 = mc.execute("show 2 from people")
>>> res2.rows
[Row(age=2, name='Alice'), Row(age=5, name='Bob')]
>>> res2.totalCount
None
property rows

A list of pyspark.sql.Row returned by the command, with columns converted to strings for display

>>> res = mc.execute("show 2 from people")
>>> res.rows
[Row(age=2, name='Alice'), Row(age=5, name='Bob')]
property totalCount

Optionally, the total size of the source data set for the command

>>> res = mc.execute("show 2 from people with count")
>>> res.totalCount
100
class magpie.MagpieSecrets(jmc)

A context object for accessing Magpie Secrets in python.

MagpieSecrets can be accessed via the secrets field of the MagpieContext

Example usage:

>>> mc.secrets.list()
['api_key', 'password']
>>> mc.secrets.exists("api_key")
True
>>> mc.secrets.get("api_key")
my_api_key_contents
exists(name)

Checks if a secret by the given name exists.

A return value of True does not imply that the callee has permission to read or write the specified secret

>>> mc.secrets.exists('api_key')
True
Parameters

name – secret name

Returns

whether the secret is defined

get(name)

Get the value of a secret.

>>> mc.secrets.get('api_key')
my_api_key_contents
Parameters

name – secret name

Returns

secret value

list()

Lists all secrets.

>>> mc.secrets.list()
['api_key', 'password']
Returns

a list of the names of all secrets

class magpie.MagpieVariables(jmc)

A context object used for accessing and updating Magpie Variables in python.

MagpieVariables can be accessed via the variables field of the MagpieContext

Example usage:

>>> mc.variables.set("my_var", 3)
>>> mc.variables.get("my_var")
3
>>> mc.variables.exists("my_var")
True
>>> mc.variables.clear()
clear()

Clears all variables set in the context.

>>> mc.variables.set("my_var", 3)
>>> mc.variables.clear()
>>> mc.variables.exists("my_var")
False
exists(name)

Determine whether a variable is defined in the context.

>>> mc.variables.set("my_var", 3)
>>> mc.variables.exists("my_var")
True
Parameters

name – variable name

Returns

whether the variable is defined

get(name)

Get the value of a variable.

>>> mc.variables.set("my_var", 3)
>>> mc.variables.get("my_var")
3
Parameters

name – variable name

Returns

variable value

list()

List all variables defined in the current context.

>>> mc.variables.set("age", 10)
>>> mc.variables.set("height", 55)
>>> mc.variables.list()
['age', 'height']
Returns

A list of variable names

remove(name)

Unset a particular variable in the context.

>>> mc.variables.set("my_var", 3)
>>> mc.variables.remove("my_var")
>>> mc.variables.exists("my_var")
False
Parameters

name – variable name

set(name, value)

Set a variable to the given value in the context.

>>> mc.variables.set("my_var", 3)
>>> mc.variables.get("my_var")
3
Parameters
  • name – variable name

  • value – variable value

Returns