Skip to content

Conversation

0wwafa
Copy link

@0wwafa 0wwafa commented May 30, 2024

Catched exceptions in server load.

Catched exceptions in server load.
@0wwafa 0wwafa changed the title Update server.cpp Catch exceptions correctly in server.cpp May 30, 2024
@mofosyne mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label May 30, 2024
Copy link
Collaborator

@mofosyne mofosyne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that you removed the redundant else statement good catch.

Anyhow looks good to me. Let's wait for the check to pass and then merge.

@mofosyne mofosyne added merge ready indicates that this may be ready to merge soon and is just holding out in case of objections bugfix fixes an issue or bug high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow) labels May 30, 2024
Copy link
Contributor

github-actions bot commented May 30, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 531 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8811.63ms p(95)=21261.06ms fails=, finish reason: stop=472 truncated=59
  • Prompt processing (pp): avg=103.91tk/s p(95)=453.73tk/s
  • Token generation (tg): avg=45.31tk/s p(95)=45.24tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=patches commit=48b860a2b2a3042c6a95da5624f8c9625f39b4dd

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 531 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1717523207 --> 1717523831
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 444.23, 444.23, 444.23, 444.23, 444.23, 799.48, 799.48, 799.48, 799.48, 799.48, 704.86, 704.86, 704.86, 704.86, 704.86, 739.59, 739.59, 739.59, 739.59, 739.59, 782.44, 782.44, 782.44, 782.44, 782.44, 779.32, 779.32, 779.32, 779.32, 779.32, 797.06, 797.06, 797.06, 797.06, 797.06, 813.18, 813.18, 813.18, 813.18, 813.18, 826.49, 826.49, 826.49, 826.49, 826.49, 828.43, 828.43, 828.43, 828.43, 828.43, 855.33, 855.33, 855.33, 855.33, 855.33, 852.04, 852.04, 852.04, 852.04, 852.04, 859.96, 859.96, 859.96, 859.96, 859.96, 808.46, 808.46, 808.46, 808.46, 808.46, 815.24, 815.24, 815.24, 815.24, 815.24, 819.17, 819.17, 819.17, 819.17, 819.17, 816.62, 816.62, 816.62, 816.62, 816.62, 817.78, 817.78, 817.78, 817.78, 817.78, 799.21, 799.21, 799.21, 799.21, 799.21, 781.09, 781.09, 781.09, 781.09, 781.09, 784.53, 784.53, 784.53, 784.53, 784.53, 792.34, 792.34, 792.34, 792.34, 792.34, 796.09, 796.09, 796.09, 796.09, 796.09, 797.79, 797.79, 797.79, 797.79, 797.79, 800.43, 800.43, 800.43, 800.43, 800.43, 800.9, 800.9, 800.9, 800.9, 800.9, 802.82, 802.82, 802.82, 802.82, 802.82, 814.07, 814.07, 814.07, 814.07, 814.07, 815.38, 815.38, 815.38, 815.38, 815.38, 815.63, 815.63, 815.63, 815.63, 815.63, 817.42, 817.42, 817.42, 817.42, 817.42, 820.46, 820.46, 820.46, 820.46, 820.46, 817.98, 817.98, 817.98, 817.98, 817.98, 822.84, 822.84, 822.84, 822.84, 822.84, 831.82, 831.82, 831.82, 831.82, 831.82, 827.74, 827.74, 827.74, 827.74, 827.74, 835.11, 835.11, 835.11, 835.11, 835.11, 836.87, 836.87, 836.87, 836.87, 836.87, 833.75, 833.75, 833.75, 833.75, 833.75, 833.74, 833.74, 833.74, 833.74, 833.74, 838.89, 838.89, 838.89, 838.89, 838.89, 839.62, 839.62, 839.62, 839.62, 839.62, 849.71, 849.71, 849.71, 849.71, 849.71, 857.45, 857.45, 857.45, 857.45, 857.45, 861.42, 861.42, 861.42, 861.42, 861.42, 860.84, 860.84, 860.84, 860.84, 860.84, 859.13, 859.13, 859.13, 859.13, 859.13, 859.07, 859.07, 859.07, 859.07, 859.07, 862.07, 862.07, 862.07, 862.07, 862.07, 861.56, 861.56, 861.56, 861.56, 861.56, 861.24, 861.24, 861.24, 861.24, 861.24, 865.24, 865.24, 865.24, 865.24, 865.24, 864.86, 864.86, 864.86, 864.86, 864.86, 865.9, 865.9, 865.9, 865.9, 865.9, 862.79, 862.79, 862.79, 862.79, 862.79, 866.37, 866.37, 866.37, 866.37, 866.37, 865.17, 865.17, 865.17, 865.17, 865.17, 865.45, 865.45, 865.45, 865.45, 865.45, 865.61, 865.61, 865.61, 865.61, 865.61, 866.25, 866.25, 866.25, 866.25, 866.25, 868.16, 868.16, 868.16]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 531 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1717523207 --> 1717523831
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 43.04, 43.04, 43.04, 43.04, 43.04, 41.34, 41.34, 41.34, 41.34, 41.34, 31.71, 31.71, 31.71, 31.71, 31.71, 32.61, 32.61, 32.61, 32.61, 32.61, 33.58, 33.58, 33.58, 33.58, 33.58, 34.21, 34.21, 34.21, 34.21, 34.21, 34.87, 34.87, 34.87, 34.87, 34.87, 35.6, 35.6, 35.6, 35.6, 35.6, 35.81, 35.81, 35.81, 35.81, 35.81, 35.68, 35.68, 35.68, 35.68, 35.68, 35.63, 35.63, 35.63, 35.63, 35.63, 35.44, 35.44, 35.44, 35.44, 35.44, 34.18, 34.18, 34.18, 34.18, 34.18, 33.87, 33.87, 33.87, 33.87, 33.87, 32.65, 32.65, 32.65, 32.65, 32.65, 30.95, 30.95, 30.95, 30.95, 30.95, 30.76, 30.76, 30.76, 30.76, 30.76, 30.58, 30.58, 30.58, 30.58, 30.58, 30.94, 30.94, 30.94, 30.94, 30.94, 30.72, 30.72, 30.72, 30.72, 30.72, 30.47, 30.47, 30.47, 30.47, 30.47, 30.48, 30.48, 30.48, 30.48, 30.48, 30.57, 30.57, 30.57, 30.57, 30.57, 30.76, 30.76, 30.76, 30.76, 30.76, 30.79, 30.79, 30.79, 30.79, 30.79, 30.73, 30.73, 30.73, 30.73, 30.73, 31.03, 31.03, 31.03, 31.03, 31.03, 30.94, 30.94, 30.94, 30.94, 30.94, 30.83, 30.83, 30.83, 30.83, 30.83, 30.78, 30.78, 30.78, 30.78, 30.78, 31.03, 31.03, 31.03, 31.03, 31.03, 31.02, 31.02, 31.02, 31.02, 31.02, 31.2, 31.2, 31.2, 31.2, 31.2, 31.33, 31.33, 31.33, 31.33, 31.33, 31.36, 31.36, 31.36, 31.36, 31.36, 31.15, 31.15, 31.15, 31.15, 31.15, 31.06, 31.06, 31.06, 31.06, 31.06, 30.73, 30.73, 30.73, 30.73, 30.73, 30.64, 30.64, 30.64, 30.64, 30.64, 30.78, 30.78, 30.78, 30.78, 30.78, 30.87, 30.87, 30.87, 30.87, 30.87, 30.99, 30.99, 30.99, 30.99, 30.99, 31.06, 31.06, 31.06, 31.06, 31.06, 30.97, 30.97, 30.97, 30.97, 30.97, 30.82, 30.82, 30.82, 30.82, 30.82, 30.25, 30.25, 30.25, 30.25, 30.25, 29.6, 29.6, 29.6, 29.6, 29.6, 29.5, 29.5, 29.5, 29.5, 29.5, 29.36, 29.36, 29.36, 29.36, 29.36, 29.35, 29.35, 29.35, 29.35, 29.35, 29.45, 29.45, 29.45, 29.45, 29.45, 29.47, 29.47, 29.47, 29.47, 29.47, 29.5, 29.5, 29.5, 29.5, 29.5, 29.48, 29.48, 29.48, 29.48, 29.48, 29.4, 29.4, 29.4, 29.4, 29.4, 29.42, 29.42, 29.42, 29.42, 29.42, 29.31, 29.31, 29.31, 29.31, 29.31, 29.48, 29.48, 29.48, 29.48, 29.48, 29.61, 29.61, 29.61, 29.61, 29.61, 29.74, 29.74, 29.74, 29.74, 29.74, 29.76, 29.76, 29.76]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 531 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1717523207 --> 1717523831
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.15, 0.15, 0.15, 0.15, 0.15, 0.4, 0.4, 0.4, 0.4, 0.4, 0.21, 0.21, 0.21, 0.21, 0.21, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.12, 0.12, 0.12, 0.12, 0.12, 0.14, 0.14, 0.14, 0.14, 0.14, 0.28, 0.28, 0.28, 0.28, 0.28, 0.29, 0.29, 0.29, 0.29, 0.29, 0.34, 0.34, 0.34, 0.34, 0.34, 0.37, 0.37, 0.37, 0.37, 0.37, 0.34, 0.34, 0.34, 0.34, 0.34, 0.19, 0.19, 0.19, 0.19, 0.19, 0.15, 0.15, 0.15, 0.15, 0.15, 0.24, 0.24, 0.24, 0.24, 0.24, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.16, 0.16, 0.16, 0.16, 0.16, 0.22, 0.22, 0.22, 0.22, 0.22, 0.16, 0.16, 0.16, 0.16, 0.16, 0.19, 0.19, 0.19, 0.19, 0.19, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.21, 0.21, 0.21, 0.21, 0.21, 0.18, 0.18, 0.18, 0.18, 0.18, 0.11, 0.11, 0.11, 0.11, 0.11, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.25, 0.25, 0.25, 0.25, 0.25, 0.18, 0.18, 0.18, 0.18, 0.18, 0.26, 0.26, 0.26, 0.26, 0.26, 0.22, 0.22, 0.22, 0.22, 0.22, 0.25, 0.25, 0.25, 0.25, 0.25, 0.09, 0.09, 0.09, 0.09, 0.09, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17, 0.3, 0.3, 0.3, 0.3, 0.3, 0.4, 0.4, 0.4, 0.4, 0.4, 0.47, 0.47, 0.47, 0.47, 0.47, 0.39, 0.39, 0.39, 0.39, 0.39, 0.27, 0.27, 0.27, 0.27, 0.27, 0.21, 0.21, 0.21, 0.21, 0.21, 0.2, 0.2, 0.2, 0.2, 0.2, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.23, 0.23, 0.23, 0.23, 0.23, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.34, 0.34, 0.34, 0.34, 0.34, 0.09, 0.09, 0.09, 0.09, 0.09, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.13, 0.13, 0.13, 0.13, 0.13, 0.16, 0.16, 0.16]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 531 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1717523207 --> 1717523831
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0]
                    
Loading

@0wwafa
Copy link
Author

0wwafa commented May 31, 2024

Why did one check fail?

@0wwafa
Copy link
Author

0wwafa commented May 31, 2024

LOL
Run editorconfig-checker
examples/server/server.cpp:
3038: Trailing whitespace

removed an emty line to please a tupid bot.
@0wwafa
Copy link
Author

0wwafa commented Jun 3, 2024

why is this still not accepted?

@ngxson
Copy link
Collaborator

ngxson commented Jun 4, 2024

After a second look on the PR, I don't think it's necessary to add this try..catch block. If there is a problem loading the model, ctx_server.load_model(params) already returns false. Instead, your try..catch mostly to wrap ctx_server.init()

In addition, it would be nice to have a concrete example of a model failed to load (i.e. example of model / command). It makes no sense to fix a problem that never occurs.

} catch (const std::exception &e) {
// Catch known standard exceptions
LOG_ERROR("unable to load model", {{"error", e.what()}});
exit(1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot to change this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.. I give up... 3 days and still nothing for a simple modification..
also: on windows, the program crashes for every stupid error like putting a non existant model filename.
that just plain wrong.
all exceptions should be caught. mine was only an example.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

try {
// Attempt to load the model
if (!ctx_server.load_model(params)) {
LOG_ERROR("unable to load model", {{"error", "Load model failed"}});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is why it makes no sense to me: we already catch the unable to load model. Why do we need another try..catch to do that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because if you do: server -m nonexistantfile the program couses a windows CRASH instead of just exiting with the error.
image

@0wwafa 0wwafa closed this Jun 5, 2024
@0wwafa 0wwafa deleted the patches branch June 5, 2024 11:09
@mofosyne
Copy link
Collaborator

mofosyne commented Jun 5, 2024

Everyone is a volunteer here. We do what we can. My understanding is that you thought merging soon means it will definitely be merged in. On second assessment I can see that the label really should be labeled as merge ready and is there to give other people in the team time to assess and provide feedback. I've made the adjustment to the label to make this clearer next time.

Anyway we appreciate the fact that you made an effort and will note down your recommendation separately as you have deleted this patch branch, to ensure you will be credited when we get around to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix fixes an issue or bug examples high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow) merge ready indicates that this may be ready to merge soon and is just holding out in case of objections Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants