Projects > Data Provenance

Data Provenance

Description

JSON files are a common way to store and share data. I often find myself needing to write out just the values for various purposes. Perhaps this is to quickly spot trends, duplicates, or unexpected values. While I could write individual scripts for each of these tasks, I have found this common approach to printing values has been extremely useful. Printing just the values immediately leaves me with the question of "Where did this value come from?" For that reason, I have also chosen to print the keys leading up to a particular value. This approach provides some version of "data provenance" so I can quickly see not only the values but the context in which that value exists.

Running the Program

This program is run using the following command structure:

go run .\data_provenance.go

File Structure

For this project, files are organized into a single folder in the following way:

data_provenance/
├── data_provenance.go
├── test_1.json
├── test_2.json

The Code

The core logic of this program is contained in the data_provenance.go file. The test_1.json and test_2.json files are simply example files that can be used to test the program and verify its functionality. The code reads in a test JSON file, parses it, and then recursively traverses the structure to print out each value along with its corresponding keys. This approach provides a clear view of the values and their corresponding key structure.

Below is the code for the data_provenance.go file.

package main

import (
	"encoding/json"
	"fmt"
	"os"
)

func expand(k_input string, v any, values []map[string]string) []map[string]string {
	if item, verify := v.(map[string]any); verify {
		for k, v := range item {
			// fmt.Printf("Key: %s\n", k)
			new_key := fmt.Sprintf("%s/%s", k_input, k)
			values = expand(new_key, v, values)
		}
	} else if item, verify := v.([]any); verify {
		for i, v := range item {
			temp_key := fmt.Sprintf("[%d]", i)
			// fmt.Printf("Key: %v\n", temp_key)
			new_key := fmt.Sprintf("%s/%s", k_input, temp_key)
			values = expand(new_key, v, values)
		}
	} else if item, verify := v.(string); verify {
		// fmt.Printf("Value: %v\n", item)
		values = append(values, map[string]string{k_input: fmt.Sprintf("%v", item)})
	} else if item, verify := v.(float64); verify {
		// fmt.Printf("Value: %v\n", item)
		values = append(values, map[string]string{k_input: fmt.Sprintf("%v", item)})
	} else if item, verify := v.(bool); verify {
		// fmt.Printf("Value: %v\n", item)
		values = append(values, map[string]string{k_input: fmt.Sprintf("%v", item)})
	} else {
		fmt.Printf("\n\n\n==============================\n")
		fmt.Printf("<<< ERROR >>>\n")
		fmt.Printf("Unable to properly parse value:\n%v\n", v)
		fmt.Printf("==============================\n\n\n")
	}

	return values
}

func main() {
	fmt.Printf("Program Started.\n\n")

	values := []map[string]string{}

	filepath := "./test_2.json"
	data, err := os.ReadFile(filepath)
	if err != nil {
		// fmt.Printf("Error 1: %v\n", err)
		os.Exit(1)
	}

	var result map[string]any
	err = json.Unmarshal(data, &result)
	if err != nil {
		// fmt.Printf("Error 2: %v\n", err)
		os.Exit(1)
	}

	for k, v := range result {
		// fmt.Printf("Key: %s\n", k)
		values = expand(string(k), v, values)
	}

	for _, valueMap := range values {
		for k, v := range valueMap {
			fmt.Printf("[%s]: [%s]\n", k, v)
		}
	}

	fmt.Printf("\nProgram Completed.")
}