Have you ever wished that you could query a large JSON document? I wished that very thing recently while working with a 3rd party API. One particular API call was returning mountains of data, and I needed to get a list of distinct values of a particular property buried within the objects of this JSON response. I started trying to write a regex expression to do it, but quickly decided that one problem was enough.
My salvation came from an unexpected source: Microsoft SQL Server.
[more]
I was working on a data synchronization project, and one of my tasks was to produce a list of possible values for fields from one of the systems. I could get the data in question in JSON format via their API, but the result was something like this:
{
"Id": 1111,
"Type": "Human",
"Fields": [
{
"Id": 1111,
"Name": "TitleId",
"Type": "Select",
"Options": [
{
"Id": 1,
"Name": "Mr. & Mrs."
},
{
"Id": 2,
"Name": "Other"
},
{
"Id": 3,
"Name": "Rev."
},
{
"Id": 4,
"Name": "Prof."
},
{
"Id": 5,
"Name": "Dr."
},
{
"Id": 6,
"Name": "Ms."
},
{
"Id": 7,
"Name": "Mrs."
},
{
"Id": 8,
"Name": "Mr."
}
]
},
{
"Id": 1112,
"Name": "FirstName",
"Type": "Text Input",
"Required": true
},
{
"Id": 1113,
"Name": "MiddleName",
"Type": "Text Input",
"Required": false
},
{
"Id": 1114,
"Name": "LastName",
"Type": "Text Input",
"Required": true
},
{
"Id": 1115,
"Name": "Phone1",
"Type": "Text Input",
"Required": false
},
{
"Id": 1116,
"Name": "Phone2",
"Type": "Text Input",
"Required": false
},
{
"Id": 1117,
"Name": "IsMailingList",
"Type": "Select Basic",
"Required": false,
"Options": [
{
"Id": 1,
"Name": "Yes"
},
{
"Id": 2,
"Name": "No"
}
]
},
{
"Id": 1118,
"Name": "BirthDate",
"Type": "Date Input",
"Required": false
},
{
"Id": 1119,
"Name": "Gender",
"Type": "Select Basic",
"Required": false,
"Options": [
{
"Id": 1,
"Name": "Male"
},
{
"Id": 2,
"Name": "Female"
}
]
},
{
"Id": 1120,
"Name": "Veteran",
"Type": "Select Basic",
"Required": false,
"Options": [
{
"Id": 1,
"Name": "Yes"
},
{
"Id": 2,
"Name": "No"
}
]
},
{
"Id": 1121,
"Name": "Needs",
"Type": "Multiple Checkboxes",
"Required": false,
"Options": [
{
"Id": 1,
"Name": "Food"
},
{
"Id": 2,
"Name": "Water"
},
{
"Id": 3,
"Name": "Air"
},
{
"Id": 4,
"Name": "Energy"
}
]
},
{
"Id": 1122,
"Name": "Bed Preference",
"Type": "Select Basic",
"Required": false,
"Options": [
{
"Id": 1,
"Name": "Single"
},
{
"Id": 2,
"Name": "Double"
},
{
"Id": 3,
"Name": "Queen"
},
{
"Id": 4,
"Name": "King"
},
{
"Id": 5,
"Name": "Double Queen"
},
{
"Id": 6,
"Name": "Double King"
}
]
},
{
"Id": 1123,
"Name": "CarPreferences",
"Type": "Select Basic",
"Required": false,
"Options": [
{
"Id": 1,
"Name": "Sedan"
},
{
"Id": 2,
"Name": "Coupe"
},
{
"Id": 3,
"Name": "Truck"
},
{
"Id": 4,
"Name": "Van"
},
{
"Id": 5,
"Name": "Hoverboard"
},
{
"Id": 6,
"Name": "Other"
}
]
}
]
}
That really doesn’t do the actual data justice (the actual data set was about 100 times that size!), but it gives you an idea of the basic model:
public class Field {
id: int,
name: string,
type: string,
options: Option[]
}
public class Option {
id: int,
name: string
}
My task was to produce a spreadsheet that listed each field and it’s valid options, if any.
It was late, and I was feeling especially lazy, so the brute-force method was not appealing at all.
Fortunately for lazy-me, I remembered that SQL Server can work with JSON data, and I just so happened to have SQL Server Management Studio open on one of my monitors already!
The first thing I did was get the fields
portion of my JSON data into a VARCHAR(MAX)
variable in a new query window…
DECLARE @Data VARCHAR(MAX) =
'
[
{
"Id": 1111,
"Name": "TitleId",
"Type": "Select",
"Options": [
{
"Id": 1,
"Name": "Mr. & Mrs."
},
{
"Id": 2,
"Name": "Other"
},
{
"Id": 3,
"Name": "Rev."
},
{
"Id": 4,
"Name": "Prof."
},
{
"Id": 5,
"Name": "Dr."
},
{
"Id": 6,
"Name": "Ms."
},
{
"Id": 7,
"Name": "Mrs."
},
{
"Id": 8,
"Name": "Mr."
}
]
},
{
"Id": 1112,
"Name": "FirstName",
"Type": "Text Input",
"Required": true
},
{
"Id": 1113,
"Name": "MiddleName",
"Type": "Text Input",
"Required": false
},
{
"Id": 1114,
"Name": "LastName",
"Type": "Text Input",
"Required": true
},
{
"Id": 1115,
"Name": "Phone1",
"Type": "Text Input",
"Required": false
},
{
"Id": 1116,
"Name": "Phone2",
"Type": "Text Input",
"Required": false
},
{
"Id": 1117,
"Name": "IsMailingList",
"Type": "Select Basic",
"Required": false,
"Options": [
{
"Id": 1,
"Name": "Yes"
},
{
"Id": 2,
"Name": "No"
}
]
},
{
"Id": 1118,
"Name": "BirthDate",
"Type": "Date Input",
"Required": false
},
{
"Id": 1119,
"Name": "Gender",
"Type": "Select Basic",
"Required": false,
"Options": [
{
"Id": 1,
"Name": "Male"
},
{
"Id": 2,
"Name": "Female"
}
]
},
{
"Id": 1120,
"Name": "Veteran",
"Type": "Select Basic",
"Required": false,
"Options": [
{
"Id": 1,
"Name": "Yes"
},
{
"Id": 2,
"Name": "No"
}
]
},
{
"Id": 1121,
"Name": "Needs",
"Type": "Multiple Checkboxes",
"Required": false,
"Options": [
{
"Id": 1,
"Name": "Food"
},
{
"Id": 2,
"Name": "Water"
},
{
"Id": 3,
"Name": "Air"
},
{
"Id": 4,
"Name": "Energy"
}
]
},
{
"Id": 1122,
"Name": "Bed Preference",
"Type": "Select Basic",
"Required": false,
"Options": [
{
"Id": 1,
"Name": "Single"
},
{
"Id": 2,
"Name": "Double"
},
{
"Id": 3,
"Name": "Queen"
},
{
"Id": 4,
"Name": "King"
},
{
"Id": 5,
"Name": "Double Queen"
},
{
"Id": 6,
"Name": "Double King"
}
]
},
{
"Id": 1123,
"Name": "CarPreferences",
"Type": "Select Basic",
"Required": false,
"Options": [
{
"Id": 1,
"Name": "Sedan"
},
{
"Id": 2,
"Name": "Coupe"
},
{
"Id": 3,
"Name": "Truck"
},
{
"Id": 4,
"Name": "Van"
},
{
"Id": 5,
"Name": "Hoverboard"
},
{
"Id": 6,
"Name": "Other"
}
]
}
]
'
I then used the OPENJSON
function to convert my data into a queryable JSON rowset:
SELECT
*
FROM
OPENJSON(@Data)
This gave me one row per field, which was a start, but it wasn’t very useful yet. Here’s what that looked like:
What I needed to do was parse the JSON up so that I could effectively query it.
OPENJSON
allows you to describe the shape of the JSON data, and using that, I was able to convert it to something that was closer to what I wanted.
With my updated OPENJSON query…
SELECT
*
FROM
OPENJSON(@Data)
WITH(
[id] INT 'strict $.Id',
[Name] VARCHAR(MAX) '$.Name',
[Options] NVARCHAR(MAX) '$.Options' AS JSON
)
I was able to actually get some structured data out of it and into a rowset!
I now had the names of each field, but the options themselves were still trapped in a JSON blob. I needed to go further.
First, I parsed the fields themselves into a temp table…
SELECT
id ,
Name ,
Options
INTO #FieldsWithOptionsBlob
FROM
OPENJSON(@Data)
WITH(
[id] INT 'strict $.Id',
[Name] VARCHAR(MAX) '$.Name',
[Options] NVARCHAR(MAX) '$.Options' AS JSON
)
I then parsed apart the options and loaded them up into a temp table. This proved a bit tricky. I had to use the CROSS APPLY
operator along with the OPENJSON
function:
SELECT
f.Id,
opt.Name
INTO #OptionValues
FROM
#FieldsWithOptionsBlob f
CROSS APPLY OPENJSON(f.Options, '$')
WITH (
[id] INT 'strict $.Id',
[Name] VARCHAR(MAX) '$.Name'
)
AS opt
Sidenote:
CROSS APPLY
is neat. Read more about it at MSSQLTips.
This query gave me the following results:
I now had a row for each option and corresponding field ID. All that was left was to put it all back together!
For that, I followed this approach from Codepal and used a CTE, combined with the good ole FOR XML
trick for making CSV columns:
;WITH cte AS
(
SELECT
f.Id,
FieldName=f.Name,
OptionName=o.Name
FROM
#FieldsWithOptionsBlob f
LEFT JOIN #OptionValues o
ON o.Id = f.Id
)
SELECT
f.Id,
f.Name,
STUFF(o.Options, 1, 1, '')
FROM
#FieldsWithOptionsBlob f
CROSS APPLY
(
SELECT
', ' + cte.OptionName
FROM
cte
WHERE
cte.Id = f.Id
FOR XML PATH('')
) o(Options)
And that gave me what I wanted:
I’ll admit that it took me a bit to figure out the CROSS APPLY
part, but stumbling through that sure beat manually trying to put this spreadsheet together! ?