IMO if you're even slightly concerned about storage you should be using a DBMS instead of JSON files. They will handle sparse data, compression, and fast access better than a text-based file format.
Programming
Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!
Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.
Hope you enjoy the instance!
Rules
Rules
- Follow the programming.dev instance rules
- Keep content related to programming in some way
- If you're posting long videos try to add in some form of tldr for those who don't want to watch videos
Wormhole
Follow the wormhole through a path of communities !webdev@programming.dev
If its something that represents mutually exclusive states, like the license plates examples (Gov't, Embassy, Learner), an enum like 4wd mentioned is a better idea than many boolean keys. This would also be the switch/case question you posed. For a "regular case", I would include that in the enum, but if you create an enum that only contains "special cases", you can always set it to null.
On the case of booleans, I would suggest avoiding them unless it is necessary, and truly a binary (as in, two-option, not binary numbers), self-contained-in-one-key thing (obligatory anti-boolean video). If the use case is to say what a different key's object represents, you don't need it (see: enums. You'll thank yourself later if you add a third option). If the use case for using it is saying another key contains value(s), you don't need it. Many languages can handle the idea of "data is present, or not present" (either with "truthy/falsey" behavior interpreting "data-or-null", or "Maybe/Option" types), so often "data-or-null" can suffice instead of booleans.
I would suggest trying to always include all keys of a present object, even if it's value is null or not applicable. It will prevent headaches later when code might try to access that key, but it isn't present. This approach might also help you decide to reduce the quantity of keys, if they could be consolidated (as in taking booleans and converting to a state-like enum, as mentioned above), or removed (if unused and/or deprecated).
Though I know very little of enum and never used it before, I think this is what I needed. I couldnt imagine there would exist a type exactly for this purpose since I could consider adding or deprecating data later in time. I would need time understanding how I need to restructure the current JSON object to accomodate enums, but I think it will be worth it. Thanks for you time!
When the enum reaches your JSON, it will have to be a string (as JSON does not have a dedicated "enum" type). But it at least ensures that languages parsing your JSON will should have a consistent set of strings to read.
Consider this small bit of Elm code (which you may not be an Elm dev, and thats okay, but it's the concept that you should look to get):
-- A Directions "enum" type with four options:
-- North, East, South, West
type Directions
= North
| East
| South
| West
-- How to turn each Directions into a String
-- Which can then be encoded in JSON
directionsToString : Directions -> String
directionsToString direction =
case direction of
North -> "north"
East -> "east"
South -> "south"
West -> "west"
-- "Maybe Directions" since not all strings can be parsed as a Directions.
-- The return will be "Just <something>" or "Nothing"
directionsFromString : String -> Maybe Directions
directionsFromString dirString =
case dirString of
"north" -> Just North
"east" -> Just East
"south" -> Just South
"west" -> Just West
_ -> Nothing
The two functions (directionsFromString and directionsToString) are ready to be used as part of JSON handling, to read a String from a key and turn it into a Directions enum member, or to turn a Directions to a String and insert the string to a key's value
But all that aside, for your restructuring, and keeping with the license plate example, both type and license number could be contained in a small object. For example:
{
...
"licensePlate": {
"type": "government" <- an enum in the language parsing this
but a string in JSON
"plateNumber": "ABC123"
...
}
...
}
If storage space is important using uncompressed json is a bad choice, if you're compressing the json it doesnt really matter if you have lots of exceptionCase: False
fields as they will compress very well.
… why does it need to be json?
What about using enums? In this case you will have to specify them for all records, but this ensures that the field will always be present.
enum license_owner {
regular_citizen = 0,
embassy,
government,
...
}
Ive heard about enums before, but I never really paid attention to them since I never got a need to use them in any of my projects till now. I think this is exactly what I need. Ill research more on it
Thank you so much for your help
Depending on your needs you can also break it into a columnar format with some standard compression on top. This allows you to search individual fields without looking at the rest.
It also compress exceptionally well, and "rare" fields will be null in most records, so run length encoding will compress them to near zero
See fx parquet
If they are mutually exclusive special cases, using an enum like another comment mentioned makes sense, and can limit the special cases to one field. You can use an enum of strings if you want it to be more readable.
As for how the data is represented, only including the special case field when there is one makes sense as well. Keep in mind JSON is also a flexible format - you can even have the array contain mixed types, like strings for simple licenses, and objects for more complex licenses. That can reduce the size of the JSON document quite a bit, if that's an option.
Convert the Jason to S3 keys and store it as a file structure