Published on
Time to read
10 min read

Efficient data handling with GraphQL and Apollo

Dachshund walking towards a big fan with a pink chart on the background

Problem

Modern web applications and even relatively simple websites rely heavily on dynamic data. Just think of a simple example ā€” a Twitter post:

Altho, it might look simple, in reality, the page has lots of data from different sources. You will unlikely be loading everything from the same endpoint or via a single query. Different sections will be loaded separately:

  • Root tweet/post
  • Counters for views, likes, retweets, etc
  • Comments
    • Each comment could be resolving its own data just as the main tweet
  • Relevant posts section

How would you manage this data flow manually?

In recent years there was a trend to push all and whatever data into the Redux store. Altho I see this approach less and less, there are still many projects that are doing it or just keeping the legacy code.

Donā€™t get me wrong, there is nothing bad with storing data in Redux or any other global state. But it really depends on HOW you store it. And unfortunately, most of the projects just dump data in whatever shape the API provided it and then struggle to read it in UI. The store becomes bloated with data due to under-fetching and over-etching, partially normalised and denormalised, mixed, some as the list and some as the maps. The further it goes, the more chaotic and uncontrolled it is. In the end, it impacts the UX, slows the application down, and makes ongoing development more expensive.

Dachshund with a big belly eating cake

You might think this could only be a REST API problem on the client's side, but it does not matter what protocol, communication pattern, or approach you use, as the data's final destination would be the same. You can make it really bad with GraphQL too!

Surely we can invest in better code quality, think more about data merging, controlling over-fetching, updating stores on mutation, etc.

But for the same reason you use tools like React and not jQuery, you would benefit from using Apollo and not manually handled stores.

Reasons to use Apollo client

Itā€™s important to emphasise that the listed benefits do the most for Single Page Applications (SPAs), altho still could be beneficial for SSR Applications.

Efficient cache

Apollo Client has a clever storing mechanism where all the data is normalised and kept in a memory cache. Data expiry is handled automatically, and Apollo will decide when itā€™s time to request fresh data, and itā€™s possible to override the default behavior.

The cache is smart, so data is stored as entities, altho the default Cache ID is quite simple: EntityType:EntityId. Hence, automatic data merging is done internally, preventing redundant data storage. And as a side effect of data merging, some fields are updated with the latest version. Yes, this approach has problems, as the data becomes stale, but you can achieve optimal results using Apolloā€™s fetch policy correctly.

Letā€™s consider a simple example where we execute a query for user data 3 times but for two users. Initially, we can fetch two main fields: name and last login date for the user User#1. That data is stored in the cache with the User:1 Cache ID.

The second query would be almost the same, but we will fetch an email instead of a name. Since Apollo knows that entity and ID it is ā€” it will merge the data, so the cache will still hold only one User entity with ID 1, but with the product of both queriesā€™ fields. Note that the last login date could also be updated if the value changed on the server.

Fetching the data of User#2 will have no effect on User#1 and will be stored separately.

Data Referencing

As the data is normalised, nested data is linked via references, and the cache remains flat. As expected, Cache ID is the connecting point here. Letā€™s look at this computed cache example:

cache.json
{
	"User:1": {
		"id": 1,
		"name": "John",
		"comments": [
			{ "__ref": "Comment:1" },
			{ "__ref": "Comment:2" },
		]
	},
	"Comment:1" {
		"id": 1,
		"text": "First comment"
	},
	"Comment:2" {
		"id": 2,
		"text": "Second comment"
	}
}

As a developer, you will rarely have to think about this, as the __ref is the internal Apollo key for connecting the data. Altho, you might need to touch it when you define a custom data merging algorithm, which is often done for the paginated lists data merging.

Fetch policy

Apollo has a powerful API; the fetch policy is a good example. We can decide how to handle data for a specific query with a configured fetch policy.

By default, all queries are run with a cache-first policy to minimize the number of network calls and rely more on the already stored cache. Altho, this is a great strategy for data that is rarely updated, there might be better options for frequently updated data.

Letā€™s look at the available fetch policy options:

  • cache-first
  • cache-and-network
  • cache-only
  • network-only
  • no-cache
  • standby

I think using cache-and-network, if not cache-first, is enough for most use cases. With this policy, the data will be displayed if available in the cache, and the background network request will be executed to re-fetch the data. When done, the cache is updated, and UI is re-rendered with the appropriate changes.

Letā€™s look at the example when the user data is already available, but we refetch it with a cache-and-network policy.

Even if the data are already available, you will still have the loading property changing. Hence, a developer must decide what kind of indicator (if any) to show while the background refetch is running.

To understand other policies, please refer to official docs.

Mutations without redundancy

Another benefit of the normalised cache is the reduced redundancy for the mutations.

Imagine a scenario where there is a page with the userā€™s comments and another page to edit user details. And the goal is to edit the user name, which is also visible on all the comments the user posted. Without normalised cache, the comments data would need to be refetched again to update the stale userā€™s name after the change. But, with Apollo, the change is automatically applied after the mutation if the mutation returns an updated entity.

mutation.graphql
mutation UpdateUserName(id: 1, name: "Sam") {
	id
	name
}

For mutations, the server can return any data, and it makes sense to return what is being updated to reduce the number of network calls.

Let's look at the example where we load all comments from different users. As a result, we store both Comment and User entities in the normalised structured cache. All the comments with authors' names are displayed on the UI.

Then, by executing UpdateUserName query, we will update User:1 entity in the cache, as the query returned an updated value for the referenced data. The UI will immediately reflect changes as the cache is updated.

Optimistic updates

Another great feature of Apollo client is the easy handling of optimistic updates. Altho, this concept is not new and might not fit all the scenarios, itā€™s nice to have UX improvement.

With the intuitive API, we can define what the mutation would return in an optimistic scenario, and all other UIs relying on the queries fetching the referenced data will be updated automatically and instantly while the background request is still running.

Apollo is smart, so the cache is not overridden straight away. Instead, a separate version of the entity is stored outside of it until the request finishes. It ensures that the original, non-optimistic data is not broken if the response for the optimistic update differs from expected.

Letā€™s look at the similar to the previous example. While the list of comments is already rendered, the user wants to change the name, but not with an optimistic update. The referenced UI is immediately updated since we changed the name, altho the loading is still of value true because the background request has not been finished yet. Once it is, loading is changed to false, and the cache is updated.

Easy error handling

With the easy use of React Apollo hooks, we can handle errors in two ways:

Both approaches shine in different cases.

With the error property we can easily handle the UI to reflect that something went wrong. Here is a little example:

UserPreview.tsx
const UserPreview: React.FC = () => {
	const { data, loading, error } = useQuery(userPreviewQuery);
	
	return (
		<div>
			{loading && 'Loading...'}
			{error && `Error: ${error}`}
			{data && `${data.userPreview.name} ${data.userPreview.lastName}`}
		</div>
	);
}

Now, we donā€™t have to handle the error case manually, storing the details in the state or anywhere else. Instead, the error property is there for us, providing a simple way to handle all different states.

As the other option is available, the onError callback, we can handle scenarios of indirect changes. The simplest example is triggering the UI notifications via imperative API.

UserPreview.tsx
const UserPreview: React.FC = () => {
	const { data, loading } = useQuery(
		userPreviewQuery,
		{ onError: (error) => Notification.showError(error) }
	);
	
	return (
		<div>
			{loading && 'Loading...'}
			{data && `${data.userPreview.name} ${data.userPreview.lastName}`}
		</div>
	);
}

Alternatives

As the Apollo Client is fully concentrated on the GraphQL, not every project does use it.

Two well-known packages provide similar developer experience as the Apollo Client, but are made for any communication approach, usually REST API.

Both are production ready and widely used. Both are made to fix common struggles with data storing and sharing. But made different, could follow different ideas and approaches.

Conclusion

Apollo Client is a great tool, especially for Single Page Applications where the data is duplicated from the server into the client.

Its API is easy and intuitive to use with hooks or direct queries, and the cache is smart and effective.

With the additional power of GraphQL, we can do more with less code and room for errors.

Overall, I would call it recommended for SPAs and nice to have for SSR applications.