Have you ever wished that you could query a large JSON document? I wished that very thing recently while working with a 3rd party API. One particular API call was returning mountains of data, and I needed to get a list of distinct values of a particular property buried within the objects of this JSON response. I started trying to write a regex expression to do it, but quickly decided that one problem was enough.

My salvation came from an unexpected source: Microsoft SQL Server.

[more]

I was working on a data synchronization project, and one of my tasks was to produce a list of possible values for fields from one of the systems. I could get the data in question in JSON format via their API, but the result was something like this:

{
    "Id": 1111,
    "Type": "Human",
    "Fields": [
        {
            "Id": 1111,
            "Name": "TitleId",
            "Type": "Select",
            "Options": [
                {
                    "Id": 1,
                    "Name": "Mr. & Mrs."
                },
                {
                    "Id": 2,
                    "Name": "Other"
                },
                {
                    "Id": 3,
                    "Name": "Rev."
                },
                {
                    "Id": 4,
                    "Name": "Prof."
                },
                {
                    "Id": 5,
                    "Name": "Dr."
                },
                {
                    "Id": 6,
                    "Name": "Ms."
                },
                {
                    "Id": 7,
                    "Name": "Mrs."
                },
                {
                    "Id": 8,
                    "Name": "Mr."
                }
            ]
        },
        {
            "Id": 1112,
            "Name": "FirstName",
            "Type": "Text Input",
            "Required": true
        },
        {
            "Id": 1113,
            "Name": "MiddleName",
            "Type": "Text Input",
            "Required": false
        },
        {
            "Id": 1114,
            "Name": "LastName",
            "Type": "Text Input",
            "Required": true
        },
        {
            "Id": 1115,
            "Name": "Phone1",
            "Type": "Text Input",
            "Required": false
        },
        {
            "Id": 1116,
            "Name": "Phone2",
            "Type": "Text Input",
            "Required": false
        },
        {
            "Id": 1117,
            "Name": "IsMailingList",
            "Type": "Select Basic",
            "Required": false,
            "Options": [
                {
                    "Id": 1,
                    "Name": "Yes"
                },
                {
                    "Id": 2,
                    "Name": "No"
                }
            ]
        },
        {
            "Id": 1118,
            "Name": "BirthDate",
            "Type": "Date Input",
            "Required": false
        },
        {
            "Id": 1119,
            "Name": "Gender",
            "Type": "Select Basic",
            "Required": false,
            "Options": [
                {
                    "Id": 1,
                    "Name": "Male"
                },
                {
                    "Id": 2,
                    "Name": "Female"
                }
            ]
        },
        {
            "Id": 1120,
            "Name": "Veteran",
            "Type": "Select Basic",
            "Required": false,
            "Options": [
                {
                    "Id": 1,
                    "Name": "Yes"
                },
                {
                    "Id": 2,
                    "Name": "No"
                }
            ]
        },
        {
            "Id": 1121,
            "Name": "Needs",
            "Type": "Multiple Checkboxes",
            "Required": false,
            "Options": [
                {
                    "Id": 1,
                    "Name": "Food"
                },
                {
                    "Id": 2,
                    "Name": "Water"
                },
                {
                    "Id": 3,
                    "Name": "Air"
                },
                {
                    "Id": 4,
                    "Name": "Energy"
                }
            ]
        },
        {
            "Id": 1122,
            "Name": "Bed Preference",
            "Type": "Select Basic",
            "Required": false,
            "Options": [
                {
                    "Id": 1,
                    "Name": "Single"
                },
                {
                    "Id": 2,
                    "Name": "Double"
                },
                {
                    "Id": 3,
                    "Name": "Queen"
                },
                {
                    "Id": 4,
                    "Name": "King"
                },
                {
                    "Id": 5,
                    "Name": "Double Queen"
                },
                {
                    "Id": 6,
                    "Name": "Double King"
                }
            ]
        },
        {
            "Id": 1123,
            "Name": "CarPreferences",
            "Type": "Select Basic",
            "Required": false,
            "Options": [
                {
                    "Id": 1,
                    "Name": "Sedan"
                },
                {
                    "Id": 2,
                    "Name": "Coupe"
                },
                {
                    "Id": 3,
                    "Name": "Truck"
                },
                {
                    "Id": 4,
                    "Name": "Van"
                },
                {
                    "Id": 5,
                    "Name": "Hoverboard"
                },
                {
                    "Id": 6,
                    "Name": "Other"
                }
            ]
        }
    ]
}

That really doesn’t do the actual data justice (the actual data set was about 100 times that size!), but it gives you an idea of the basic model:

public class Field {
    id: int,
    name: string,
    type: string,
    options: Option[]
}

public class Option {
    id: int,
    name: string
}

My task was to produce a spreadsheet that listed each field and it’s valid options, if any.

It was late, and I was feeling especially lazy, so the brute-force method was not appealing at all.

Fortunately for lazy-me, I remembered that SQL Server can work with JSON data, and I just so happened to have SQL Server Management Studio open on one of my monitors already!

The first thing I did was get the fields portion of my JSON data into a VARCHAR(MAX) variable in a new query window…

DECLARE @Data VARCHAR(MAX) =
'
[
    {
        "Id": 1111,
        "Name": "TitleId",
        "Type": "Select",
        "Options": [
            {
                "Id": 1,
                "Name": "Mr. & Mrs."
            },
            {
                "Id": 2,
                "Name": "Other"
            },
            {
                "Id": 3,
                "Name": "Rev."
            },
            {
                "Id": 4,
                "Name": "Prof."
            },
            {
                "Id": 5,
                "Name": "Dr."
            },
            {
                "Id": 6,
                "Name": "Ms."
            },
            {
                "Id": 7,
                "Name": "Mrs."
            },
            {
                "Id": 8,
                "Name": "Mr."
            }
        ]
    },
    {
        "Id": 1112,
        "Name": "FirstName",
        "Type": "Text Input",
        "Required": true
    },
    {
        "Id": 1113,
        "Name": "MiddleName",
        "Type": "Text Input",
        "Required": false
    },
    {
        "Id": 1114,
        "Name": "LastName",
        "Type": "Text Input",
        "Required": true
    },
    {
        "Id": 1115,
        "Name": "Phone1",
        "Type": "Text Input",
        "Required": false
    },
    {
        "Id": 1116,
        "Name": "Phone2",
        "Type": "Text Input",
        "Required": false
    },
    {
        "Id": 1117,
        "Name": "IsMailingList",
        "Type": "Select Basic",
        "Required": false,
        "Options": [
            {
                "Id": 1,
                "Name": "Yes"
            },
            {
                "Id": 2,
                "Name": "No"
            }
        ]
    },
    {
        "Id": 1118,
        "Name": "BirthDate",
        "Type": "Date Input",
        "Required": false
    },
    {
        "Id": 1119,
        "Name": "Gender",
        "Type": "Select Basic",
        "Required": false,
        "Options": [
            {
                "Id": 1,
                "Name": "Male"
            },
            {
                "Id": 2,
                "Name": "Female"
            }
        ]
    },
    {
        "Id": 1120,
        "Name": "Veteran",
        "Type": "Select Basic",
        "Required": false,
        "Options": [
            {
                "Id": 1,
                "Name": "Yes"
            },
            {
                "Id": 2,
                "Name": "No"
            }
        ]
    },
    {
        "Id": 1121,
        "Name": "Needs",
        "Type": "Multiple Checkboxes",
        "Required": false,
        "Options": [
            {
                "Id": 1,
                "Name": "Food"
            },
            {
                "Id": 2,
                "Name": "Water"
            },
            {
                "Id": 3,
                "Name": "Air"
            },
            {
                "Id": 4,
                "Name": "Energy"
            }
        ]
    },
    {
        "Id": 1122,
        "Name": "Bed Preference",
        "Type": "Select Basic",
        "Required": false,
        "Options": [
            {
                "Id": 1,
                "Name": "Single"
            },
            {
                "Id": 2,
                "Name": "Double"
            },
            {
                "Id": 3,
                "Name": "Queen"
            },
            {
                "Id": 4,
                "Name": "King"
            },
            {
                "Id": 5,
                "Name": "Double Queen"
            },
            {
                "Id": 6,
                "Name": "Double King"
            }
        ]
    },
    {
        "Id": 1123,
        "Name": "CarPreferences",
        "Type": "Select Basic",
        "Required": false,
        "Options": [
            {
                "Id": 1,
                "Name": "Sedan"
            },
            {
                "Id": 2,
                "Name": "Coupe"
            },
            {
                "Id": 3,
                "Name": "Truck"
            },
            {
                "Id": 4,
                "Name": "Van"
            },
            {
                "Id": 5,
                "Name": "Hoverboard"
            },
            {
                "Id": 6,
                "Name": "Other"
            }
        ]
    }
]
'

I then used the OPENJSON function to convert my data into a queryable JSON rowset:

SELECT	
	*
FROM
	OPENJSON(@Data)

This gave me one row per field, which was a start, but it wasn’t very useful yet. Here’s what that looked like:

What I needed to do was parse the JSON up so that I could effectively query it.

OPENJSON allows you to describe the shape of the JSON data, and using that, I was able to convert it to something that was closer to what I wanted.

With my updated OPENJSON query…

SELECT	
	*
FROM
	OPENJSON(@Data)
		WITH(
			[id] INT 'strict $.Id',
			[Name] VARCHAR(MAX) '$.Name',
			[Options] NVARCHAR(MAX) '$.Options' AS JSON
		)

I was able to actually get some structured data out of it and into a rowset!

I now had the names of each field, but the options themselves were still trapped in a JSON blob. I needed to go further.

First, I parsed the fields themselves into a temp table…

SELECT	
	id ,
    Name ,
    Options
INTO #FieldsWithOptionsBlob
FROM
	OPENJSON(@Data)
		WITH(
			[id] INT 'strict $.Id',
			[Name] VARCHAR(MAX) '$.Name',
			[Options] NVARCHAR(MAX) '$.Options' AS JSON
		)

I then parsed apart the options and loaded them up into a temp table. This proved a bit tricky. I had to use the CROSS APPLY operator along with the OPENJSON function:

SELECT
	f.Id,
	opt.Name
INTO #OptionValues
FROM
	#FieldsWithOptionsBlob f
	CROSS APPLY OPENJSON(f.Options, '$') 
	WITH (
		[id] INT 'strict $.Id',
		[Name] VARCHAR(MAX) '$.Name'
	)
	AS opt

Sidenote: CROSS APPLY is neat. Read more about it at MSSQLTips.

This query gave me the following results:

I now had a row for each option and corresponding field ID. All that was left was to put it all back together!

For that, I followed this approach from Codepal and used a CTE, combined with the good ole FOR XML trick for making CSV columns:

;WITH cte AS
    ( 
    SELECT
		f.Id,
        FieldName=f.Name,
		OptionName=o.Name
    FROM
        #FieldsWithOptionsBlob f
		LEFT JOIN #OptionValues o
			ON o.Id = f.Id
    )
SELECT
	f.Id,
	f.Name,
	STUFF(o.Options, 1, 1, '')
FROM
	#FieldsWithOptionsBlob f
	CROSS APPLY
	(
		SELECT
			', ' + cte.OptionName
		FROM
			cte
		WHERE
			cte.Id = f.Id
		FOR XML PATH('')
	) o(Options)

And that gave me what I wanted:

I’ll admit that it took me a bit to figure out the CROSS APPLY part, but stumbling through that sure beat manually trying to put this spreadsheet together! ?