[Solved] How do you train a neural network without an exact answer? [closed]

Question

TLDR; Reinforcement learning

In general, training agents uses reinforcement learning. It is different than what you explained, because it seems as if you want to define a fitness heuristic to tell the agent whether it is doing OK or not, which might be biased. Reinforcement learning also has biases, but they are researchedand studied. A typical bias is a factor to determine how important previous actions are w.r.t. the current action to the current result.

With reinforcement learning, you only get positive or negative feedback for actions from time to time. You can only learn form those feedback-moments. Unfortunately, this means you only, easily learn ‘winning actions’, ‘leading to winning actions’ are harder. So you need a trick, typically something recursive in your evaluation-function, to make it work. The good news is, researchers have already come up with such tricks. You can start with temporal difference learning or Q-learning. If your model is Neural Network-based, they are trained with gradient descent typically.

Accepted Answer

TLDR; Reinforcement learning

In general, training agents uses reinforcement learning. It is different than what you explained, because it seems as if you want to define a fitness heuristic to tell the agent whether it is doing OK or not, which might be biased. Reinforcement learning also has biases, but they are researchedand studied. A typical bias is a factor to determine how important previous actions are w.r.t. the current action to the current result.

With reinforcement learning, you only get positive or negative feedback for actions from time to time. You can only learn form those feedback-moments. Unfortunately, this means you only, easily learn ‘winning actions’, ‘leading to winning actions’ are harder. So you need a trick, typically something recursive in your evaluation-function, to make it work. The good news is, researchers have already come up with such tricks. You can start with temporal difference learning or Q-learning. If your model is Neural Network-based, they are trained with gradient descent typically.