The proximal Policy Optimization algorithm is the ML-Agents toolkit's default reinforcement algorithm. This approach can switch between sampling data via interaction with the environment and utilizing stochastic gradient descent to optimize a "surrogate" cost function. Although when creating a new machine learning model, it is tough to know the optimal model architecture for a given project immediately. In most cases, We can either utilize the algorithm's default values or we may use the machine to undertake this exploration and automatically select the best model architecture. Hyperparameters define the model architecture; thus, searching for the best model is called hyperparameter tuning. We focus on comparing four hyperparameters: Beta, Epsilon, Lambd, Num_epoch of PPO algorithm in solving a maze. The results obtained in the training process show the difference in the selection of hyperparameters. The modification of hyperparameters will depend on the maze's complexity and the complexity of the Agent's actions. This thesis will help to make appropriate choices at hyperparameters in concrete and practical projects. Code is available at hungpt17102k/Maze-Solving-ML-Agent (github.com).