Ideally, time-lapse seismic data from different vintages should be identical except at the target area (i.e., the reservoir). However, it is almost impossible to have identical data because of many factors, such as different positioning of the sources and receivers and near-surface velocity variation, which result in 4D noise and reduce the repeatability of the data. To increase the 4D signal and reduce the noise, time-lapse cross equalization methods aim to match the monitor data to the baseline. Here, we propose to implement the cross equalization intelligently using deep learning models. We specifically use a convolutional autoencoder trained on the base data to later predict the matching using another fully connected neural network in the latent space. We implement the approach on a synthetic data and show an improvement in the repeatability by imaging the reservoir and computing the normalized root mean square.