It is better to start, probably, immediately from the second point, since I can assume that it is he who is a priority.
Binarization (threshold processing) is essentially a process by which the amount of information available in an image is minimized. This fact is both a plus and a minus, because together with non-representing interest information is erased, and that could significantly help in solving the problem.
If we turn to the image with "Photoshop" and in particular to some of the polygons in the center, whose borders are marked by red lines, it is not difficult to notice the presence of white "zones" on their right. These "zones" have the same size and, in general, do not stand out in any way from exactly the same white "zones", but already included in the areas of interest (parking spaces).
White "zones" within an area of interest can also have an arbitrary size and often cannot be considered as a single whole with others located nearby, since they have independent boundaries and relatively close size to those that are not included in the area of interest.
In addition, the “zones” between two areas of interest can be closer to each other (by the distance between the central moments of the circuits), rather than the “zones” of the same parking space.
All of the above exceptions are fully represented in the image of "Photoshop", which effectively negates the possibility of correct clustering of the contours of the "zones" into objects "car".
Using binarization it is extremely rare to get an acceptable result if the original image contains the natural environment.
However, if you allow a person to intervene in the work of the algorithm, namely, to prompt the latter the real boundaries of the object by manually selecting the rectangles of parking spaces, the whole task will be reduced to a fairly simple operation. An example of image binarization in C ++ and C # (EmguCV).
Suppose there is a source:

Load it, translate to shades of gray and binarize it:
C ++:
cv::Mat src_mat = cv::imread("img.jpg"); if(src_mat.empty()) return; cv::Mat gry_mat; cv::cvtColor(src_mat, gry_mat, cv::COLOR_BGR2GRAY); cv::Mat bin1_mat, bin2_mat; cv::threshold(gry_mat, bin1_mat, 230, 255, cv::THRESH_BINARY); cv::threshold(gry_mat, bin2_mat, 25, 255, cv::THRESH_BINARY_INV); cv::Mat bin_mat = bin1_mat + bin2_mat;
C #:
Mat srcMat = new Mat("img.jpg", LoadImageType.AnyColor); if(srcMat.IsEmpty) return; Mat grayMat = new Mat(); Mat bin1Mat = new Mat(); Mat bin2Mat = new Mat(); Mat binMat = new Mat(); CvInvoke.CvtColor(srcMat, grayMat, ColorConversion.Bgr2Gray); CvInvoke.Threshold(grayMat, bin1Mat, 230, 255, ThresholdType.Binary); CvInvoke.Threshold(grayMat, bin2Mat, 25, 255, ThresholdType.BinaryInv); CvInvoke.Add(bin1Mat, bin2Mat, binMat);
Here an attempt was made to binarize the upper and lower bounds. The coefficients are hand-picked and may vary in optimality for other images. Thus it turns out to save more useful details:

Further, if we start looking for contours and draw them:
C ++:
std::vector<std::vector<cv::Point> > contours; cv::findContours(bin_mat.clone(), contours , cv::RETR_EXTERNAL, cv::CHAIN_APPROX_SIMPLE); cv::Mat ctr_mat = cv::Mat::zeros(bin_mat.size(), CV_8U); cv::drawContours(ctr_mat, contours, -1, cv::Scalar::all(255), 1);
C #:
VectorOfVectorOfPoint contours = new VectorOfVectorOfPoint(); Mat hierarchy; CvInvoke.FindContours(binMat, contours, hierarchy, RetrType.External, ChainApproxMethod.ChainApproxSimple); Mat ctrMat = new Mat(binMat.Size, DepthType.Cv8U, 3); CvInvoke.DrawContours(ctrMat, contours, -1, new MCvScalar(255));
... then we get this picture:

The contours turned out to be quite a lot, and the doubt is already creeping in that it will be possible to assemble the contours of the individual parts into a single vehicle contour. Let's try to filter out some of them, for example, by area, at the same time drawing their encircling "squares" on the source image:
C ++:
std::vector<std::vector<cv::Point> > contours; cv::findContours(bin_mat.clone(), contours , cv::RETR_EXTERNAL, cv::CHAIN_APPROX_SIMPLE); std::vector<std::vector<cv::Point> > contours2; for(auto itr = contours.begin(); itr != contours.end(); ++itr) { if(cv::contourArea(*itr) > 8) { contours2.push_back(*itr); cv::Rect rc = cv::boundingRect(*itr); cv::rectangle(src_mat, rc, cv::Scalar(0,0,255)); } } cv::Mat ctr_mat = cv::Mat::zeros(bin_mat.size(), CV_8U); cv::drawContours(ctr_mat, contours2, -1, cv::Scalar::all(255), 1);
C #:
VectorOfVectorOfPoint contours = new VectorOfVectorOfPoint(); VectorOfVectorOfPoint contours2 = new VectorOfVectorOfPoint(); Mat hierarchy; CvInvoke.FindContours(binMat, contours, hierarchy, RetrType.External, ChainApproxMethod.ChainApproxSimple); for (int i = 0; i < contours.Size; ++i) { if (CvInvoke.ContourArea(contours[i]) > 8) { contours2.Push(contours[i]); Rectangle rc = CvInvoke.BoundingRectangle(contours[i]); CvInvoke.Rectangle(srcMat, rc, new MCvScalar(0, 0, 255)); } } Mat ctrMat = new Mat(binMat.Size, DepthType.Cv8U, 3); CvInvoke.DrawContours(ctrMat, contours2, -1, new MCvScalar(255));
The contours have become much smaller, the “noise” has diminished, but we are thus continuing to lose information about the objects of interest (cars), making them ultimately more dissimilar to the original ones. Those that are located closer to the camera, still wherever it goes, but the farther away, the situation becomes more pitiable:

In the end, it may be worthwhile to draw the rectangles encircling the contours, and then it will become clear what it would be possible to hook onto in order to unite the disparate parts of each car into a single whole:

But the angle of the survey does not allow to doubt: parts of different objects are located at relatively the same distances as parts of the same object, leading to the impossibility of correct clustering. And since in binarization we have other factors for which it would be possible to catch on, there simply isn’t, the task of automatically determining the boundaries of cars under existing conditions becomes unsolvable. Well, maybe it will be one or two cars that are closest to the camera, but no more.
In this case, there is nothing more than to manually determine the boundaries of parking spaces. This task on OpenCV is usually not shifted, because he has the functionality of working with the mouse (so that the user indicates the coordinates of the points of the polygons of the parking spaces on the image) is generally limited and implemented with the help of general-purpose frameworks.
To obtain the image matrix of a parking space from predetermined coordinates in a frame, you can use a simple construction:
cv::Mat parking_lot_mat = src_mat(cv::Rect(0,0,100,100)).clone();
Now parking_lot_mat will contain a part of the original image at x coordinates, equal to 0, and y, equal to 0, and will be 100x100 pixels.
Next, we do all the same work on binarization, which was done on a large image. At the end, it remains to find the contours and their areas using the method: cv::contourArea() . If the sum of the areas of contours exceeds a certain threshold, calculated empirically, then the car in this parking space can be considered as detected.