First, you need to understand how the interaction with the backend occurs. By and large, there are two choices: http and WebSocket. From your description it is clear that the http is being considered: it assumes a client-server interaction. That is, the server at a certain address (URL) listens (on a specific TCP port) when the client requests a connection from it. Thus, the interaction begins at the initiative of the client. There is a lot of information about the HTTP protocol on the Internet, including the specification (although it is already somewhat outdated, but not lost its relevance).
The protocol http 1.1, which is currently mainly used for this interaction, is text. That is, after the connection is established, the client starts sending a request to the server consisting of a stream of characters. The server somehow processes this stream of characters and returns the stream of characters to the client in response. That's all, the cycle "request" - "answer" is over. So, you need two libraries - an http server that can listen to the specified TCP port and provide you with a text request stream and the ability to record a response stream. Many server http libraries also provide convenient tools for analyzing and parsing url strings, http headers, etc. In the simplest case for Java, this is, for example, Grizzly . And the http client in Android, which will send http requests to the server and receive responses.
Actually, the JSON that the server should send in response is also a stream of characters. But you need to somehow form it from the data structures that are available to you in a programming language. In the simplest case, you can use the concatenation of strings and variable values. In more complex cases, as well as for convenience and reducing errors, use special libraries that help to perform conversions from textual JSON or XML to data structures of the language and back. So, you need to decide on the method of this transformation.
Finally, you need to deploy your server somewhere. It is convenient to use, for example, Heroku .
And you can take a ready-made JSON storage like Firebase as a backend
For debugging the backend on the local computer it is very convenient to use http-clients like Postman . If you have more specific questions - ask.
If you want to start doing something complicated and interesting, I highly recommend to get acquainted with Elixir and Phoenix on the Erlang virtual machine.
With the growth of experience will come the understanding that the server logic has a lot of tasks, and it’s inconvenient to deal with them in one heap. It turns out that it is best to separate the logic of interaction with external devices and protocols (database, network, disk, queue) and the kernel that implements the key business logic of the system. Then it's time to get acquainted with Hexagonal Architecture. Then it turns out that all this should process dozens or hundreds of requests from different users at the same time - then it will be possible to master the Actors Model. Or another of the implemented abstractions of parallel and competitive programming.
But, of course, to write your first http-backend all this is not necessary.