-
Notifications
You must be signed in to change notification settings - Fork 350
Description
I was able to investigate and resolve this issue. Please see this as a way to potentially fix it for everybody else.
See the Solution heading below. I have indicated this solution in my Slack conversation with Ashutosh Singh
What happened:
Upon following the custom-eval-tutorial
, there are connection errors, The OpenMatch director seems to be unable to call to the evaluator (connection, and synchronizer errors).
Upon resolving the connection errors there are duplicate matchID errors due to a faulty matchfunction in the tutorial. Please view logs below.
Simulated frontend creates tickets successfully:
2023/07/12 17:21:40 Ticket created successfully, id: cine2947ck9ji9cojvg0
2023/07/12 17:21:40 Ticket created successfully, id: cine2947ck9ji9cojvgg
2023/07/12 17:21:40 Ticket created successfully, id: cine2947ck9ji9cojvh0
2023/07/12 17:21:40 Ticket created successfully, id: cine2947ck9ji9cojvhg
...
Matchfunction keeps sending over the proposals, but they don't seem do be resolved:
2023/07/12 17:21:37 TCP net listener initialized for port 50502
2023/07/12 17:21:44 Generating proposals for function mode_based_profile
2023/07/12 17:21:44 Generating proposals for function mode_based_profile
2023/07/12 17:21:44 Generating proposals for function mode_based_profile
2023/07/12 17:21:45 Streaming 6 proposals to Open Match
2023/07/12 17:21:45 Streaming 6 proposals to Open Match
2023/07/12 17:21:45 Streaming 5 proposals to Open Match
...
2023/07/12 17:22:12 Generating proposals for function mode_based_profile
2023/07/12 17:22:12 Generating proposals for function mode_based_profile
2023/07/12 17:22:12 Generating proposals for function mode_based_profile
2023/07/12 17:22:12 Streaming 28 proposals to Open Match
2023/07/12 17:22:12 Streaming 27 proposals to Open Match
2023/07/12 17:22:12 Streaming 32 proposals to Open Match
Evaluator is only listening, but there is nothing else:
time="2023-07-12T17:21:38Z" level=info msg="TCP net listener initialized" app=evaluator component=evaluator.server port=50508
And the director cannot connect to the evaluator:
2023/07/12 17:21:37 Fetching matches for 3 profiles
2023/07/12 17:21:48 Failed to fetch matches for profile mode_based_profile, got rpc error: code = Unknown desc = error(s) in FetchMatches call. syncErr=[error receiving match from synchronizer: rpc error: code = Unknown desc = error calling evaluator: error starting evaluator call: rpc error: code = Unavailable desc = last resolver error: dns: A record lookup error: lookup open-match-evaluator on 10.96.0.10:53: server misbehaving], mmfErr=[<nil>]
2023/07/12 17:21:49 Failed to fetch matches for profile mode_based_profile, got rpc error: code = Unknown desc = error(s) in FetchMatches call. syncErr=[error receiving match from synchronizer: rpc error: code = Unknown desc = error calling evaluator: error starting evaluator call: rpc error: code = Unavailable desc = last resolver error: dns: A record lookup error: lookup open-match-evaluator on 10.96.0.10:53: server misbehaving], mmfErr=[<nil>]
Solution
Further investigation led me to go back to the documentation where the configmap, and all of the OpenMatch services are referenced by om-
, instead of open-match-
prefix. Current YAML installation files contain open-match-override-configmap
and not om-override-configmap
, as the documentation suggests:
In the Installation page:
Custom evaluator tutorial overrides the wrong configmap in this case:
apiVersion: v1
kind: ConfigMap
metadata:
name: om-configmap-override
namespace: open-match
...
While YAML installation files for OpenMatch core components point to a open-match-configmap-override
configmap.
part of 01-open-match-core.yaml (note configMap name):
...
spec:
volumes:
- name: om-config-volume-default
configMap:
name: open-match-configmap-default
- name: om-config-volume-override
configMap:
name: open-match-configmap-override
...
Essentially in my case this is what caused the problem. After changing the name in customization.yaml
to open-match-configmap-override
.
After that another issue was happening where, there were duplicate matchIDs generated by the matchfunction due to the IDs being created with a time based unique ID and without any UUID usage.
I have changed the go code from this:
matches = append(matches, &pb.Match{
MatchId: fmt.Sprintf("profile-%v-time-%v-%v", p.GetName(), time.Now().Format("2006-01-02T15:04:05.00"), count),
MatchProfile: p.GetName(),
MatchFunction: matchName,
Tickets: matchTickets,
})
To this:
matches = append(matches, &pb.Match{
MatchId: fmt.Sprintf("p-%v-id-%v-t-%v-c-%v", p.GetName(), uuid.New().String(), time.Now().Format("2006-01-02T15:04:05.00"), count),
MatchProfile: p.GetName(),
MatchFunction: matchName,
Tickets: matchTickets,
})
What you expected to happen:
For the cluster to run with success, and all the logs reflecting a working OpenMatch solution.
How to reproduce it (as minimally and precisely as possible):
Follow the documentation YAML installation instructions, and the custom-eval-tutorial
Output of `kubectl version:
clientVersion:
buildDate: "2023-05-17T14:20:07Z"
compiler: gc
gitCommit: 7f6f68fdabc4df88cfea2dcf9a19b2b830f1e647
gitTreeState: clean
gitVersion: v1.27.2
goVersion: go1.20.4
major: "1"
minor: "27"
platform: windows/amd64
kustomizeVersion: v5.0.1
serverVersion:
buildDate: "2023-05-17T14:13:28Z"
compiler: gc
gitCommit: 7f6f68fdabc4df88cfea2dcf9a19b2b830f1e647
gitTreeState: clean
gitVersion: v1.27.2
goVersion: go1.20.4
major: "1"
minor: "27"
platform: linux/amd64
Cloud Provider/Platform (AKS, GKE, Minikube etc.):
Docker Desktop with K8s enabled.
Open Match Release Version:
1.7
Install Method(yaml/helm):
YAML