You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+7Lines changed: 7 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -349,6 +349,13 @@ Even more queries can be found [here](https://colab.research.google.com/github/R
349
349
350
350
# Latest updates
351
351
352
+
## Version 0.3.0 alpha 3
353
+
- Added parameters to the jsoniq magic to select the desired output to print: -j, -df, -pdf
354
+
- Added informative error message with a hint on how to fix when trying to get a DataFrame and there is no schema.
355
+
- Added parameter -t to the jsoniq magic to measure the response time
356
+
- The RumbleSession object now saves the latest result (sequence of items) in a field called lastResult. This is particularly useful in notebooks for post-processing a result in Python after obtained it through the jsoniq magic.
357
+
- Improved static type detection upon binding a pandas or pyspark DataFrame as an input variable to a JSONiq queries.
358
+
352
359
## Version 0.2.0 alpha 2
353
360
- You can change the result size cap through to the now accessible Rumble configuration (for example rumble .getRumbleConf().setResultSizeCap(10)). This controls how many items can be retrieved at most with a json() call. You can increase it to whichever number you would like if you reach the cap.
354
361
- Add the JSONiq magic to execute JSONiq queries directly in a notebook cell, using the RumbleDB instance shipped with the library.
Copy file name to clipboardExpand all lines: src/jsoniq/sequence.py
+28Lines changed: 28 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -2,8 +2,30 @@
2
2
frompyspark.sqlimportSparkSession
3
3
frompyspark.sqlimportDataFrame
4
4
importjson
5
+
importsys
5
6
6
7
classSequenceOfItems:
8
+
schema_str="""
9
+
No DataFrame available as no schema was automatically detected. If you still believe the output is structured enough, you could add a schema and validate expression explicitly to your query.
10
+
11
+
This is an example of how you can simply define a schema and wrap your query in a validate expression:
12
+
13
+
declare type local:mytype as {
14
+
"product" : "string",
15
+
"store-number" : "int",
16
+
"quantity" : "decimal"
17
+
};
18
+
validate type local:mytype* {
19
+
for $product in json-lines("http://rumbledb.org/samples/products-small.json", 10)
20
+
where $product.quantity ge 995
21
+
return $product
22
+
}
23
+
24
+
RumbleDB keeps getting improved and automatic schema detection will improve as new versions get released. But even when RumbleDB fails to detect a schema, you can always declare your own schema as shown above.
25
+
26
+
For more information, see the documentation at https://docs.rumbledb.org/rumbledb-reference/types
27
+
"""
28
+
7
29
def__init__(self, sequence, rumblesession):
8
30
self._jsequence=sequence
9
31
self._rumblesession=rumblesession
@@ -28,9 +50,15 @@ def rdd(self):
28
50
returnrdd.map(lambdal: json.loads(l))
29
51
30
52
defdf(self):
53
+
if (not"DataFrame"inself._jsequence.availableOutputs()):
No DataFrame available as no schema was detected. If you still believe the output is structured enough, you could add a schema and validate expression explicitly to your query.
54
+
55
+
This is an example of how you can simply define a schema and wrap your query in a validate expression:
56
+
57
+
declare type mytype as {
58
+
"product" : "string",
59
+
"store-number" : "int",
60
+
"quantity" : "decimal"
61
+
};
62
+
validate type mytype* {
63
+
for $product in json-lines("http://rumbledb.org/samples/products-small.json", 10)
64
+
where $product.quantity ge 995
65
+
return $product
66
+
}
67
+
"""
68
+
69
+
if(args.pyspark_data_frame):
70
+
df=response.df();
71
+
ifdfisnotNone:
72
+
df.show()
73
+
74
+
if (args.pandas_data_frame):
75
+
pdf=response.pdf()
76
+
ifpdfisnotNone:
77
+
print(pdf)
78
+
79
+
if (args.apply_updates):
80
+
if ("PUL"inresponse.availableOutputs()):
81
+
response.applyPUL()
82
+
print("Updates applied successfully.")
83
+
else:
84
+
print("No Pending Update List (PUL) available to apply.")
85
+
86
+
if (args.jsonor (notargs.pandas_data_frameandnotargs.pyspark_data_frame)):
0 commit comments